oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
192 stars 40 forks source link

Fatal Error: Failed to open the database file #2

Closed oushujun closed 7 years ago

oushujun commented 7 years ago

@mcscimenc I moved your last bug report to this new thread. Start forwarding:

OK, the program runs for a little while and now gives this error:

ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50. ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.

Fatal Error: Failed to open the database file Program halted !!

Can't open Salvinia_cucullata_v1.1.fa.LTRlib.clust: No such file or directory. ERROR: This script is written to convert fasta files into a prettier format. Usage: fasta-reformat.pl input-fasta-file number-of-positions-per-line ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/annotate_gff.pl line 12.

Salvinia_cucullata_v1.1.fa.LTRlib.clust doesn't exist, and I ran LTR_retriever with -v

oushujun commented 7 years ago

This seems like a CD-HIT error. Please update your CD-HIT package if possible. Please check or attach the file "Salvinia_cucullata_v1.1.fa.LTRlib" in this thread for further checking.

Shujun

mcsimenc commented 7 years ago

I have the most current version of CD-HIT installed. The Salvinia_cucullata_v1.1.fa.LTRlib doesn't exist. Here's a ls -lh of the working directory:

-rw-r--r-- 1 derstudent derlab  691 May 30 18:29 call_ltrretriever.qsub
-rw-r--r-- 1 derstudent derlab  915 May 30 18:52 debug
drwxr-xr-x 2 derstudent derlab 4.0K May 30 18:29 input
-rw-r--r-- 1 derstudent derlab    0 May 30 15:51 LTR_retriever.err
-rw-r--r-- 1 derstudent derlab 2.7K May 30 18:52 LTR_retriever.out
drwxr-xr-x 2 derstudent derlab   73 May 30 18:50 RM_31919.TueMay301850262017
drwxr-xr-x 2 derstudent derlab   75 May 30 18:52 RM_3321.TueMay301852152017
drwxr-xr-x 2 derstudent derlab    6 May 30 18:52 RM_3358.TueMay301852192017
-rw------- 1 derstudent derlab  788 May 30 19:08 SacuLTRHarv.LTRretr.e1258
-rw------- 1 derstudent derlab    0 May 30 19:08 SacuLTRHarv.LTRretr.o1258
-rwxr-x--- 1 derstudent derlab 223M May 28 16:23 Salvinia_cucullata_v1.1.fa
-rw-r--r-- 1 derstudent derlab 1.1M May 30 18:50 Salvinia_cucullata_v1.1.fa.defalse
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.LTRanno.gff
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.LTRlib.fa
-rw-r--r-- 1 derstudent derlab 619K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE
-rw-r--r-- 1 derstudent derlab 619K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.clust
-rw-r--r-- 1 derstudent derlab 9.0K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.clust.clstr
-rw-r--r-- 1 derstudent derlab  36M May 30 18:30 Salvinia_cucullata_v1.1.fa.ltrTE.fa
-rw-r--r-- 1 derstudent derlab 159K May 30 18:35 Salvinia_cucullata_v1.1.fa.ltrTE.fa.cleanup
-rw-r--r-- 1 derstudent derlab 1.9M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.mask.lib
-rw-r--r-- 1 derstudent derlab  56K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.nmtf
-rw-r--r-- 1 derstudent derlab 1.3M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.pass
-rw-r--r-- 1 derstudent derlab  39K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.pass.clust.clstr
-rw-r--r-- 1 derstudent derlab  30K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.pass.list
-rw-r--r-- 1 derstudent derlab 4.2K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.pass.nmtf.list
-rw-r--r-- 1 derstudent derlab  25M May 30 18:35 Salvinia_cucullata_v1.1.fa.ltrTE.stg1
-rw-r--r-- 1 derstudent derlab 619K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg2
-rw-r--r-- 1 derstudent derlab 619K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln
-rw-r--r-- 1 derstudent derlab 619K May 30 18:51 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln.clean
-rw-r--r-- 1 derstudent derlab   47 May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln.clean.exclude.list
-rw-r--r-- 1 derstudent derlab   47 May 30 18:51 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.cln.exclude.list
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.dna.out
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.line.out
-rw-r--r-- 1 derstudent derlab    0 May 30 18:51 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.otherTE.out
-rw-r--r-- 1 derstudent derlab  70K May 30 18:52 Salvinia_cucullata_v1.1.fa.ltrTE.stg3.plantP.out
-rw-r--r-- 1 derstudent derlab 3.5M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc.cln
-rw-r--r-- 1 derstudent derlab  37K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc.list
-rw-r--r-- 1 derstudent derlab    0 May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.trunc.masked.cleanup
-rw-r--r-- 1 derstudent derlab 7.7K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.veryfalse
-rw-r--r-- 1 derstudent derlab 1.3M May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.veryfalse.fa
-rw-r--r-- 1 derstudent derlab  17K May 30 18:50 Salvinia_cucullata_v1.1.fa.ltrTE.veryfalse.list
-rw-r--r-- 1 derstudent derlab  609 May 30 18:52 Salvinia_cucullata_v1.1.fa.nmtf.LTRlib.fa
-rw-r--r-- 1 derstudent derlab 4.3K May 30 18:52 Salvinia_cucullata_v1.1.fa.nmtf.pass.list
-rw-r--r-- 1 derstudent derlab  56K May 30 18:52 Salvinia_cucullata_v1.1.fa.nmtf.prelib
-rw-r--r-- 1 derstudent derlab  30K May 30 18:52 Salvinia_cucullata_v1.1.fa.pass.list
-rw-r--r-- 1 derstudent derlab 331K May 30 18:52 Salvinia_cucullata_v1.1.fa.pass.list.gff3
-rw-r--r-- 1 derstudent derlab 620K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib
-rw-r--r-- 1 derstudent derlab 597K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT.cln
-rw-r--r-- 1 derstudent derlab 6.1K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT.list
-rw-r--r-- 1 derstudent derlab    0 May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.INT.masked.cleanup
-rw-r--r-- 1 derstudent derlab  24K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR
-rw-r--r-- 1 derstudent derlab  24K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR.clust
-rw-r--r-- 1 derstudent derlab 2.7K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR.clust.clstr
-rw-r--r-- 1 derstudent derlab 2.6K May 30 18:52 Salvinia_cucullata_v1.1.fa.prelib.LTR.list
-rw-r--r-- 1 derstudent derlab 761K May 30 18:52 Salvinia_cucullata_v1.1.fa.retriever.all.scn.adj
-rw-r--r-- 1 derstudent derlab 6.3K May 30 18:52 Salvinia_cucullata_v1.1.fa.retriever.all.scn.adj.list
-rw-r--r-- 1 derstudent derlab 429K May 30 18:29 Salvinia_cucullata_v1.1.fa.retriever.scn
-rw-r--r-- 1 derstudent derlab 761K May 30 18:50 Salvinia_cucullata_v1.1.fa.retriever.scn.adj
-rw-r--r-- 1 derstudent derlab  78K May 30 18:50 Salvinia_cucullata_v1.1.fa.retriever.scn.adj.list
-rw-r--r-- 1 derstudent derlab 234K May 30 18:35 Salvinia_cucullata_v1.1.fa.retriever.scn.extend
-rw-r--r-- 1 derstudent derlab  25M May 30 18:35 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa
-rw-r--r-- 1 derstudent derlab  51M May 30 18:36 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa
-rw-r--r-- 1 derstudent derlab 252K May 30 18:37 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa.anno
-rw-r--r-- 1 derstudent derlab 3.9M May 30 18:37 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa.scn
-rw-r--r-- 1 derstudent derlab 1.5M May 30 18:37 Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa.tbl
-rw-r--r-- 1 derstudent derlab 357K May 30 18:30 Salvinia_cucullata_v1.1.fa.retriever.scn.full
-rw-r--r-- 1 derstudent derlab 672K May 30 18:30 Salvinia_cucullata_v1.1.fa.retriever.scn.list
oushujun commented 7 years ago

Hi,

Sorry that our server was down for 2 days and I need to take care of it first. Thanks for providing the detailed output files info. It looks like RepeatMasker is not running correctly. If you have it installed, this could be caused by the "long sequence name" issue. Sequence names longer than 15 characters may not be recognized by RepeatMasker and could cause program halt. I have developed a new module to deal with the long sequence name issue and pushed to GitHub. Please download the latest version and see if it works for your genome. It is still suggested that you chop short the sequence name other than the program do it for you - it may not clever enough to make a decent conversion. Please let me know if this is not your case.

Thank you! Shujun

mcsimenc commented 7 years ago

Hi Shujun, no worries I'm glad you're helping work it out, I think LTR_retriever will be very useful for our analyses! We're running RepeatMasker version open-4.0.7.

All of the sequence names in the genome are of this format (it turns out they are exactly 15 char):

>Sacu_v1.1_s0001
>Sacu_v1.1_s0002
>Sacu_v1.1_s0123

I downloaded the new release and it is running right now. I may not be able to get back with the result until Monday.

oushujun commented 7 years ago

Hi,

I pushed some updates to the repository which may fix some problems you have previously. Please update the code and try again. Thanks!

Shujun

mcsimenc commented 7 years ago

Hi Shujun,

I just ran the new updated program and get an error from hmmpress when it is called by RepeatMasker. The hmmpress log file describes an error with sequence headers:

Error: File format problem in trying to open HMM file /home/joshd/data/salvinia/repeat_lib/LTR_retriever/RM_5827.WedJun72213492017/Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib.
Format tag is '>Sacu_v1.1_s0001:1488510..1488827|LTR_1': unrecognized.
Current H3 format is 'HMMER3/f'. Previous H2/H3 formats 

From LTR_retriever:

ERROR: RepeatMasker is not running properly!
        Please check the file Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib and Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc and test run:
                RepeatMasker -q -pa 40 -no_is -norna -nolow -div 40 -lib Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib -cutoff 225 Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc
oushujun commented 7 years ago

Hi,

This is a very helpful information! Do you installed RepeatMasker using HMMER as the primary search engine? Basically, the first error is saying the program is expecting an HMM file but the input file "Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib" is not recognizable (because this is a fasta file!). The second error is the new checking criteria I implement, and obviously it found the expecting result is not there. Please test run and see what errors RepeatMasker found: RepeatMasker -q -pa 40 -no_is -norna -nolow -div 40 -lib Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib -cutoff 225 Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc

Regards, Shujun

mcsimenc commented 7 years ago

Yes I had installed RepeatMasker with hmmer as the default search engine. I changed it to ncbi blast+ and LTR_retriever ran without errors. However I'm unsure if it finished completely. I noticed that the number of elements in defalse plus the number in pass.list.gff3 is only a little more than half the elements in the input from LTRHarvest. I'm also surprised that only ~3% of the input elements made it into *pass.list.gff3. How can I make sure everything finished? Thanks! Matt

mcsimenc commented 7 years ago

Also I did try the -no_is flag with RepeatMasker and I saw the same error.

oushujun commented 7 years ago

Dear Matt,

If there is no error or warning message, LTR_retriever probably has run correctly! It's very normal that the majority of input candidates cannot make it all the way down to the pass.list, because a lot of them (can't name a number, but half is not surprising!) are false positives or truncated LTRs! Before the structural analysis (defalse), the program had done several steps of filtering such as gap filtering, tandem repeat filtering, length balance filtering, and etc. The structural analysis will try to find out the structural information of candidates and further decide whether that is a real LTR or not. Note that the purpose of this program is to confidentially and sensitively identify intact* LTRs, and further generate a library (exemplar). Intact LTRs represent the most recent LTR amplifications with clear structural information, hence we can confidently say this is an LTR. Using this confident set, we can further confidently identify most, if not all, LTRs in the genome. It may be hard to believe that 3% of the input is all you need, but in our practices this is true. Also, you can find our benchmark data in the manuscript: http://biorxiv.org/content/early/2017/05/12/137141.article-metrics LTR_retriever is highly specific and accurate, but the sensitivity is also as high as the input. So basically you just removed all false LTRs and retain all true LTRs with the program. You can now check the genome.out and genome.tbl files for the whole genome LTR annotation. Let me know if the data are still not correct.

Best, Shujun Ou

oushujun commented 7 years ago

For your last comment, could you describe it with more details about what data and command you used?

Thanks, Shujun

mcsimenc commented 7 years ago

What I meant by the comment about the -no_is flag is that I tried the command you suggested:

RepeatMasker -q -pa 40 -no_is -norna -nolow -div 40 -lib Salvinia_cucullata_v1.1.fa.mod.ltrTE.mask.lib -cutoff 225 Salvinia_cucullata_v1.1.fa.mod.ltrTE.trunc

and I saw the hmmer error:

Error: File format problem in trying to open HMM file

but that was with RepeatMasker running HMMER as the default search engine.

The reason I am unsure if it finished is because only 67% of the input elements are mentioned either in defalse or pass.list. Maybe I'm misunderstanding what these files contain.

oushujun commented 7 years ago

Hi Matt,

Yes, you are right. Not all input elements can enter the steps of defalse or pass.list. Before these steps, there is a prescreening step which will screen out candidates with sequencing gaps, tandem repeats and etc. Such candidates are highly not likely to be a true LTR and thus will not be passed to the next step (i.e., defalse). That's why you only see part of them show up in defalse.

Best, Shujun Ou

Suchithra-V commented 6 years ago

Hi. I am working on a 16s data and I was using cd-hit-otu latest release specifically for Mi-seq . The qc and otu shell scripts were generated successfully. But after running otu script I got this error after somtime.. Please help me resolve this. screenshot from 2018-02-12 13 32 11

These are the result files obtained after running otu script.

otu_results

oushujun commented 6 years ago

Hello, sorry to learn about your issue. It seems that this is a cd-hit related issue, which is not developed under this repository and not by me. This site is for LTR_retriever that helps to identify LTR retrotransposons. Please reroute to the right repo for help.

Good luck!

Shujun

On Feb 12, 2018 3:03 AM, "Suchithra-V" notifications@github.com wrote:

Hi. I am working on a 16s data and I was using cd-hit-otu latest release specifically for Mi-seq . The qc and otu shell scripts were generated successfully. But after running otu script I got this error after somtime.. Please help me resolve this. [image: screenshot from 2018-02-12 13 32 11] https://user-images.githubusercontent.com/36399366/36087600-3e444b4e-0ff9-11e8-8ce5-4de77be8a016.png

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/2#issuecomment-364850846, or mute the thread https://github.com/notifications/unsubscribe-auth/AFt-NM2cqz3wRzy0XaWZdYqvMYysSLqcks5tT_BKgaJpZM4NrGFx .