oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
193 stars 40 forks source link

LTR retriever is not compatible with RepeatModeler2 since v2.9.8? #169

Open BitaoQiu opened 7 months ago

BitaoQiu commented 7 months ago

Dear LTR retriever developers,

I was using RepeatModeler and found that there is no output from LTR retriever (v2.9.8 and v.2.9.9, either from GitHub or Conda). This seems to have been reported before by other users. The log file of v2.9.8 reports:


Thu Mar 28 23:13:07 CET 2024 Dependency checking: All passed! Thu Mar 28 23:13:16 CET 2024 LTR_retriever is starting from the Init step. Thu Mar 28 23:13:17 CET 2024 Start to convert inputs... Total candidates: 35905 Total uniq candidates: 35905

Thu Mar 28 23:13:22 CET 2024 Module 1: Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded.

    Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa
    Options:
            -misschar       n       Define the letter representing unknown sequences; case insensitive; default: n
            -Nscreen        [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
            -nc             [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
            -nr             [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
            -minlen         [int]   Minimum sequence length filter after clean up; default: 100 (bp)
            -cleanN         [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
            -trf            [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
            -trf_path       path    Path to the trf program

Thu Mar 28 23:13:22 CET 2024 0 clean candidates remained


Out of curiosity, I downgraded LTR retriever to v2.9.5 from conda, and this time it passed Module 1:


Thu Apr 11 21:37:41 CEST 2024 Dependency checking: All passed! Thu Apr 11 21:37:43 CEST 2024 LTR_retriever is starting from the Init step. Thu Apr 11 21:37:45 CEST 2024 Start to convert inputs... Total candidates: 35905 Total uniq candidates: 35905

Thu Apr 11 21:37:49 CEST 2024 Module 1: Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded.

Thu Apr 11 21:37:49 CEST 2024 35905 clean candidates remained

Thu Apr 11 21:37:49 CEST 2024 Modules 2-5: Start to analyze the structure of candidates... The terminal motif, TSD, boundary, orientation, age, and superfamily will be identified in this step.


It seems there is something wrong with get_range.pl from v2.9.8, which makes LTR_retriever not able to read LTR_harvest output. May I ask is there any suggestion?

oushujun commented 7 months ago

Hello,

I could not reproduce the issue using the test data. Can you provide one example?

Shujun

juanjo255 commented 7 months ago

Hello! @oushujun

I've been struggling with the same problem :( using the genome.fa for testing available at EDTA

Tue Apr 23 21:51:11 -05 2024    Dependency checking: All passed!
Tue Apr 23 21:51:18 -05 2024    LTR_retriever is starting from the Init step.
Tue Apr 23 21:51:18 -05 2024    Start to convert inputs...
                Total candidates: 14
                Total uniq candidates: 14

Tue Apr 23 21:51:18 -05 2024    Module 1: Start to clean up candidates...
                Sequences with 10 missing bp or 0.8 missing data rate will be discarded.
                Sequences containing tandem repeats will be discarded.

        Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa 
    Options:
        -misschar   n   Define the letter representing unknown sequences; case insensitive; default: n
        -Nscreen    [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
        -nc     [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
        -nr     [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
        -minlen     [int]   Minimum sequence length filter after clean up; default: 100 (bp)
        -cleanN     [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
        -trf        [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
        -trf_path   path    Path to the trf program

Tue Apr 23 21:51:18 -05 2024    0 clean candidates remained

cp: cannot stat 'seq.fa.retriever.scn.adj': No such file or directory
Tue Apr 23 21:51:18 -05 2024    No LTR-RT was found in your data.

Tue Apr 23 21:51:18 -05 2024    All analyses were finished!
oushujun commented 7 months ago

@juanjo255 can you please provide your commands? Thanks!

Shujun

juanjo255 commented 7 months ago

Hello @oushujun,

thanks for the help.

I am using RepeatModeler. I also had to downgrade LTR_RETRIEVER to the 2.5 tag for it to work. So, after building database with BuildDatabase, it was just this simple command:

RepeatModeler -LTRStruct -threads 32 -database ~/path/to/databaset was just,

I hope it can help,

Juan

BitaoQiu commented 7 months ago

@oushujun Sorry for my late reply... Yes, just using RepeatModeler and LTR_retriever 2.9.8 will produce the error (as @juanjo255 wrote)... I am now only using 2.9.5 ...

Best, Bitao