oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
179 stars 40 forks source link

Error messages when running on LTR_finder and MGEScan data #8

Closed TWV-GIT closed 6 years ago

TWV-GIT commented 6 years ago

I ran the script on data from MGEScan (filename LTR_out) which contains a large number of sequences all with ID ">mobile_genetic_element". It then gave the following result: `##########################

LTR_retriever v1.5

##########################

Contributors: Shujun Ou, Ning Jiang

Parameters: -genome Harm.fasta -inmgescan LTR_out.txt

Thu Nov 23 16:15:53 CET 2017 The longest sequence ID in the genome contains 141 characters, which is longer than the limit (15) Trying to reformat seq IDs... Attempt 1... Thu Nov 23 16:15:55 CET 2017 Seq ID conversion successful!

Thu Nov 23 16:15:55 CET 2017 Start to convert inputs... Use of uninitialized value $seq_ID in exists at /home/wolf/Desktop/Programs/LTR_retriever-master/bin/get_range.pl line 118, line 3. Argument "NA" isn't numeric in numeric ge (>=) at /home/wolf/Desktop/Programs/LTR_retriever-master/bin/get_range.pl line 127, line 3. Argument "NA" isn't numeric in numeric ge (>=) at /home/wolf/Desktop/Programs/LTR_retriever-master/bin/get_range.pl line 127, line 3. Argument ">mobile_genetic_element1" isn't numeric in subtraction (-) at /home/wolf/Desktop/Programs/LTR_retriever-master/bin/get_range.pl line 134, line 3. Illegal division by zero at /home/wolf/Desktop/Programs/LTR_retriever-master/bin/get_range.pl line 146, line 3. ERROR: LOC list is empty. Total candidates: 214 Total uniq candidates: 0

Thu Nov 23 16:15:58 CET 2017 Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded.

Error: Error while loading sequenceThu Nov 23 16:15:58 CET 2017 0 clean candidates remained

cp: cannot stat 'Harm.fasta.mod.retriever.scn.adj': No such file or directory Thu Nov 23 16:15:58 CET 2017 No LTR was found in your data.

Thu Nov 23 16:15:58 CET 2017 All analyses were finished! `

Any solutions? Much appreciated

oushujun commented 6 years ago

Hi,

Sorry for the delay. Using MGEScan-LTR for LTR_retriever is a little bit tricky. I used a modified version of MGEScan-LTR by Dr. James Estill provided in the DAWGPAWS package. You can download it via the following link: http://darwin.informatics.indiana.edu/evolution/data/sw/find_ltr_DAWGPAWS.pl Or by googling "find_ltr_DAWGPAWS". After you get the script of "find_ltr_DAWGPAWS.pl", replace the "find_ltr.pl" script in the MGEScan-LTR package. You may need to modify the "find_ltr_DAWGPAWS.pl" script such as providing paths for dependencies to make it runnable. Another challenge of using MGEScan-LTR is that the program only takes monosequence files (a file only contain one sequence), which means you have to split your multisequence file into many files and run many times of the program. This is tedious, but I got a solution for you. So you can use the script "run_MGEScan.pl" in LTR_retriever/bin/ instead to run the modified version of MGEScan-LTR for your multisequence genomic file. You may need to modify this script for the path to "find_ltr_DAWGPAWS.pl". If you get through all these hassles (and I am sorry about that), then the *.ltrpos file is the one that is recognizable by LTR_retriever.

INSTEAD, you can use LTRharvest and/or LTR_finder to generate candidates in a much easier way with a similar or even better quality. I provided some practical command lines in the Manual for running these programs.

Let me know if you have more issues.

Thanks, Shujun

TWV-GIT commented 6 years ago

Thanks for the advise and explanation, I already got good outputs from my LTR_finder data and now will have a look at LTRharvest.

Greetings, Thomas.

oushujun commented 6 years ago

Hi Thomas,

Good to know LTR_retriever is working! You can use two candidate files together by supplying both -inharvest and -infinder. Accuracy is similar, but you can gain a couple extra points of sensitivity this way.

Good luck! Shujun