tk2 / RetroSeq

RetroSeq is a bioinformatics tool that searches for mobile element insertions from aligned reads in a BAM file and a library of reference transposable elements. Please read the wiki page (link below) for usage instructions. Also, there is a page on the wiki describing how the 1000 genomes CEU trio was carried out with the files and parameters used for the various steps.
64 stars 25 forks source link

Error while calling TEs from pooled samples #1

Closed frl1 closed 8 years ago

frl1 commented 10 years ago

I was trying to apply RetroSeq call to a dataset consisting of two discover-out files and their corresponding BAM-files. The programs starts to run, but then the following errors appears and I get an empty VCF file.

I'd very much appreciate any workaround or fix to circumvent this problem. Thanks,

Fritjof

retroseq.pl -call -bam samples.fofn -filter rmsk.bed.tab -ref ref.fa -input out_files.fofn -output rs_call
RetroSeq: A tool for discovery and genotyping of transposable elements from short read alignments

Version: 1.41
Author: Thomas Keane (thomas.keane@sanger.ac.uk)

Found 2 BAM files

Found sample: SAMPLE17_SAMPLE18
Calling sample SAMPLE17_SAMPLE18
Beginning paired-end calling...
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 768, <$tfh> line 1.
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 769, <$tfh> line 1.
Use of uninitialized value $currentType in string eq at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 765, <$tfh> line 2.
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 768, <$tfh> line 2.
Use of uninitialized value $currentType in concatenation (.) or string at /./home/flammers/programs/RetroSeq-1.41/bin/retroseq.pl line 769, <$tfh> line 2.
PE Call: 
PE Call: 
Creating VCF file of calls....
tk2 commented 9 years ago

Hi Fritjof - could you post the first couple of lines from the out_files.fofn file? And if this is a fofn, could you post the first couple of lines from each of the fofns?

Thomas

frl1 commented 9 years ago

Hi, It's a fofn and each files looks like this:

out_files.fofn:

sample17_full.out 
sample18_full.out 
==> sample17_full.out <==
scaffold79  1618    1708    LTR FCC3FBRACXX:4:2205:20625:62168#ACCAGACT -   -   90
scaffold79  5136    5226    LTR FCC3FBRACXX:4:1101:12484:88749#ACCAGACT +   +   90
scaffold79  8308    8398    LTR FCC3FBRACXX:4:1302:3013:12429#ACCAGACT  +   +   90
scaffold79  36914   37004   LTR FCC3FBRACXX:4:2302:10297:100920#ACCAGACT    -   +   90
scaffold79  36918   37008   LTR FCC3FBRACXX:4:2213:10145:89557#ACCAGACT -   +   90
scaffold79  37073   37163   LTR FCC3FBRACXX:4:1211:3229:30589#ACCAGACT  +   +   90
scaffold79  58155   58245   LTR FCC3FBRACXX:4:2311:17897:2633#ACCAGACT  +   +   90
scaffold79  58156   58246   LTR FCC3FBRACXX:4:2203:6575:14023#ACCAGACT  +   +   90
scaffold79  68043   68133   LTR FCC3FBRACXX:4:2309:2462:29601#ACCAGACT  -   +   90
scaffold79  74252   74342   LTR FCC3FBRACXX:4:1214:8663:36197#ACCAGACT  -   +   90

==> sample18_full.out <==
scaffold79  8611    8701    LTR FCC3FBRACXX:3:1314:19505:17838#CTATAACT -   -   90
scaffold79  37076   37166   LTR FCC3FBRACXX:3:2308:19292:64771#CTATAACT +   +   90
scaffold79  52824   52914   LTR FCC3FBRACXX:3:1206:4991:28683#CTATAACT  +   +   90
scaffold79  74233   74323   LTR FCC3FBRACXX:3:2212:5392:41065#CTATAACT  -   +   90
scaffold79  88831   88921   LTR FCC3FBRACXX:3:1314:9199:91200#CTATAACT  +   -   90
scaffold79  97459   97549   LTR FCC3FBRACXX:3:2205:3623:50966#CCTATAAC  +   +   90
scaffold79  102332  102422  LTR FCC3FBRACXX:3:1308:13328:36541#CTATAACT +   +   90
scaffold79  102358  102448  LTR FCC3FBRACXX:3:1114:21020:56188#CTATAACT +   +   90
scaffold79  136963  137053  LTR FCC3FBRACXX:3:2102:2100:43276#CTATAACT  +   -   90
scaffold79  136984  137074  LTR FCC3FBRACXX:3:1307:16783:73826#CTATAACT +   -   90
tk2 commented 9 years ago

Those files look fine. I can't see any obvious reason why this should be happening. I've just made a commit on the master to print out the offending line causing the error. Could you download the master zip and run the calling phase? It should die with an error and print the line causing the problem.

frl1 commented 9 years ago

Hi Thomas, thanks! Now it is the following. Apparently the fofn is not recognized correctly and is parsed as if it was the outfile itself. I switched to using the file prefix and then it worked.

Version: 1.41
Author: Thomas Keane (thomas.keane@sanger.ac.uk)

Found 2 BAM files

Found sample: SAMPLE17_SAMPLE18
Calling sample SAMPLE17_SAMPLE18
Beginning paired-end calling...
Failed to detect the TE type to be called from line: sample17_full.out
Tue Jul 8 10:41:54 CEST 2014