oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
179 stars 40 forks source link

Uninitialized value error from get_range.pl #18

Closed jackson-wells closed 6 years ago

jackson-wells commented 6 years ago

Running with harvest and finder inputs. The errors are produced when the program is executed with just harvest outputs, just finder outputs and when both are run together. The command utilized to run is as follows: LTR_retriever -genome genome.fasta -inharvest harvScreen.out -infinder findScreen.out -threads 16. The command is run using a batch scheduler as per infrastructure rules.

The errors appear as follows: Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 3.
Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 4. Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 5. Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 6. ......... Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 135241. Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 135242. Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 135243. Use of uninitialized value $seq_ID in exists at ./bin/get_range.pl line 120, line 135244. Warning: LOC list genome.fasta.retriever.scn.full is empty. Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa Options: -misschar n Define the letter representing unknown sequences... -Nscreen [0|1] Enable (1) or disable (0) the -nc parameter; default: 1 -nc [int] Ambuguous sequence len cutoff; discard the entire.... -nr [0-1] Ambuguous sequence percentage cutoff; discard the entire sequence.... -cleanN [0|1] Retain (0) or remove (1) the -misschar taget in output sequence -trf [0|1] Enable (1) or disable (0) tandem repeat finder (trf); default: 1 -trf_path path Path to the trf program cp: cannot stat ‘genome.fasta.retriever.scn.adj’: No such file or directory

I hope that this is sufficient information to assist with this error. All dependent software is up to date, to my knowledge. Also, LTR_retriever is configured to use CDHIT.

oushujun commented 6 years ago

Hello,

Sorry for the delay. Unfortunately, I could not determine the cause of this error. There is one possible cause: Your genome and LTRharvest/LTR_finder input files are not in the same version (usually due to the update of your assembly).

Other information that may help to pinpoint the error, could you provide the program status information before and after this error?

Thanks, Shujun

jackson-wells commented 6 years ago

Thank you for the response,

The genome file utilized to generate harvest and finder outputs has not changed since said outputs were created. The LTR_finder version used was 1.0.7. The iteration of LTRharvest used was contained in GenomeTools 1.5.9.

In the previous post I supplied only the error output in its entirety, sourced from the batch scheduler's output error file. The following information is contained in a complementary batch scheduling output file from the same job:

##########################

LTR_retriever v1.8.0

##########################

Contributors: Shujun Ou, Ning Jiang

Please cite: Ou S, Jiang N: LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 2018, 176:1410-1422 Parameters: -genome genome.fasta -inharvest harvScreen.out -infinder findScreen.out -threads 16

Fri Aug 10 12:43:59 PDT 2018 Dependency checking: All passed! Fri Aug 10 12:45:31 PDT 2018 Start to convert inputs... Total candidates: 135242 Total uniq candidates: 0

Fri Aug 10 12:46:36 PDT 2018 Module 1: Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded. printed to screen and Fri Aug 10 12:46:36 PDT 2018 0 clean candidates remained

Fri Aug 10 12:46:36 PDT 2018 No LTR-RT was found in your data.

Fri Aug 10 12:46:36 PDT 2018 All analyses were finished!

This is the extent of the output data pertaining to the error. Again, I hope this will provide sufficient information to get to the bottom of what is going wrong.

Thank you for any help/insight, Jackson

jackson-wells commented 6 years ago

Accidentally closed the issue, apologies.

oushujun commented 6 years ago

Hello Jackson,

Thank you for your information. Something is wrong about the input files. 0 unique candidates were found from ~135K raw inputs as shown by this:

Total candidates: 135242 Total uniq candidates: 0

Please make sure your LTRharvest input file and the converted LTR_retriever input (it should be "genome.fasta.retriever.scn") looks like this: 89510 91513 2004 89510 89667 158 91358 91513 156 90.51 0 211061 215815 4755 211061 211229 169 215642 215815 174 86.21 0 527653 533725 6073 527653 527934 282 533441 533725 285 92.28 0

Thanks, Shujun

jackson-wells commented 6 years ago

Hello Shujun,

You were correct about the input files. It would seem I made a very silly mistake with my LTRharvest output file. I was inputting a file to LTR_retriever that was in FASTA format, instead of the required coordinate format. LTR_retriever is now running happily and is expected to provide an actual output this time around. Thank you for all of your help and speedy replies.

Cheers,

Jackson Wells