tk2 / RetroSeq

RetroSeq is a bioinformatics tool that searches for mobile element insertions from aligned reads in a BAM file and a library of reference transposable elements. Please read the wiki page (link below) for usage instructions. Also, there is a page on the wiki describing how the 1000 genomes CEU trio was carried out with the files and parameters used for the various steps.
64 stars 25 forks source link

Uninitalized value error post PE parsing #4

Open biobenkj opened 8 years ago

biobenkj commented 8 years ago

When I run RetroSeq in the align mode, it gets to PE alignment parsing and then breaks. Not sure what the error means (uninitialized value before assignment?).

Input: perl bin/retroseq.pl -discover -bam ../bwa/Mtbcosmid.sorted.bam -eref ../bwa/retroseqTNlib.tab -output ../bwa/Mtbcosmidtest.candidates.tab -align

Output: RetroSeq: A tool for discovery and genotyping of transposable elements from short read alignments

Version: 1.41 Author: Thomas Keane (thomas.keane@sanger.ac.uk)

Reading -eref file: ../bwa/retroseqTNlib.tab

Min anchor quality: 20 Min percent identity: 80 Min length for hit: 36

Opening BAM (../bwa/Mtbcosmid.sorted.bam) and getting initial set of candidate mates.... Reading chromosome: pRD12F9 1075 candidate reads remain to be found after first pass.... Reading chromosome: pRD12F9 Parsing PE alignments.... Use of uninitialized value $lastLine in string ne at bin/retroseq.pl line 587. Alignment did not complete correctly

Any insight you could provide would be great!

tk2 commented 8 years ago

I suspect that your exonerate alignment did not complete. This line is where it checks the exonerate output for "-- completed exonerate analysis" from exonerate to say it completed fully. Did it maybe run over a time limit or memory on your machine?

kenza12 commented 8 years ago

Hi. I have the same problem. If someone could find a solution, it would be great to share it please. I run the tool only on one chromosome to test it. So, there is no problem of memory or time limit. Thanks

biobenkj commented 8 years ago

So the way around this @tk2 and @kenza12 is to download the latest version of exonerate (https://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate) [2.4.0], recompile and execute retroseq. Must be some issue with 2.2.0...

RoseString commented 8 years ago

I am using exonerate 2.4, but still ran into the same problem. I think there is a memory issue associated with the usage of --bestn. I have more than 100Gb memory..Not sure how to solve this.

ghost commented 7 years ago

I tried two different versions of exonerate (2.4 and 2.22) with the samples used in the tutorial. I get the same error with both versions. Below is the tail end of my output.

" Reading chromosome: GL000225.1 Reading chromosome: GL000192.1 Reading chromosome: NC_007605 Reading chromosome: hs37d5 Using reference TE locations to assign discordant mates... Screening for hits to: Alu Screening for hits to: L1HS Use of uninitialized value $lastLine in string ne at retroseq.pl line 509. Alignment did not complete correctly Parsing PE alignments.... "

I used the tutorial commands with updated paths to my files.

Is this issue going to be fixed?

dwesche commented 6 years ago

Hi @tk2 I'm getting the same error with both exonerate 2.2.0 and 2.4.0:

... 649922 candidate reads remain to be found after first pass.... Reading chromosome: chr1 ... Parsing PE alignments.... Use of uninitialized value $lastLine in string ne at /home/newmanlab/dwesche/programs/RetroSeq/bin/retroseq.pl line 509. Alignment did not complete correctly

Here's the run command: retroseq.pl -discover -align -bam /my/bam/file.bam -eref /my/eref/file.txt -output ./outfile.txt

Are there any new insights on this? Thanks!

wangruohan111 commented 5 years ago

I also have this problem and my exonerate is 2.4.0. Anyone has a solution?

tk2 commented 5 years ago

Hi - I just re-ran the NA12878 data from the wiki page and it completes just fine. The underlying cause is usually that exonerate ran out of memory, if you were running on a compute farm can you check if the process hit the memory limit?

I'm happy to have a look at specific examples if you can provide me with test data.

nikitagambhir commented 4 years ago

Hi, I had the same problem and this is certainly not a memory issue. I ran RetroSeq on 55 samples and only one sample (referred to as 'bad' sample) produced this error repeatedly. Each of my samples had 16 chromosomes. When I split the bam file of the 'bad' sample into 16 bam files (one file per chromosome) and then ran the analysis, RetroSeq worked.