oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

cleanup.pl Bug? #162

Open asgray opened 4 months ago

asgray commented 4 months ago

Hi, I'm in the process of updating to 2.9.9 from 2.9.0 and I'm seeing some odd outputs:

~/projects/LTR_retriever$ ./LTR_retriever -genome dmel-smaller.fa -inharvest raw-struct-results.txt

############################
### LTR_retriever v2.9.9 ###
############################

Contributors: Shujun Ou, Ning Jiang

For LTR_retriever, please cite:

        Ou S and Jiang N (2018). LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176(2): 1410-1422.

For LAI, please cite:

        Ou S, Chen J, Jiang N (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 2018;46(21):e126.

Parameters: -genome dmel-smaller.fa -inharvest raw-struct-results.txt

Mon Feb 12 03:56:29 PM PST 2024 Dependency checking: All passed!
Mon Feb 12 03:56:34 PM PST 2024 LTR_retriever is starting from the Init step.
Mon Feb 12 03:56:34 PM PST 2024 The longest sequence ID in the genome contains 68 characters, which is longer than the limit (13)
                                Trying to reformat seq IDs...
                                Attempt 1...
Mon Feb 12 03:56:34 PM PST 2024 Seq ID conversion successful!

Mon Feb 12 03:56:34 PM PST 2024 Start to convert inputs...
                                Total candidates: 42
                                Total uniq candidates: 42

Mon Feb 12 03:56:34 PM PST 2024 Module 1: Start to clean up candidates...
                                Sequences with 10 missing bp or 0.8 missing data rate will be discarded.
                                Sequences containing tandem repeats will be discarded.

        Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa 
        Options:
                -misschar       n       Define the letter representing unknown sequences; case insensitive; default: n
                -Nscreen        [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
                -nc             [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
                -nr             [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
                -minlen         [int]   Minimum sequence length filter after clean up; default: 100 (bp)
                -cleanN         [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
                -trf            [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
                -trf_path       path    Path to the trf program

Mon Feb 12 03:56:34 PM PST 2024 0 clean candidates remained

cp: cannot stat 'dmel-smaller.fa.mod.retriever.scn.adj': No such file or directory
Mon Feb 12 03:56:34 PM PST 2024 No LTR-RT was found in your data.

Mon Feb 12 03:56:34 PM PST 2024 All analyses were finished!

I believe the command that calls cleanup.pl is: perl ./bin/cleanup.pl -trf 1 -trf_path /usr/local/bin/trf -misschar N -nc 10 -nr 0.8 -minlen 100 -minscore 1000 -f dmel-smaller.fa.mod.ltrTE.fa > dmel-smaller.fa.mod.ltrTE.stg1

What is the expected behavior here?

oushujun commented 3 months ago

You have very few candidates to begin with, and the clean up process may determine all of them not valid.

Shujun

CSU-KangHu commented 3 months ago

Hi @oushujun,

Should the line $trf=0 if /^-trf$/i and $ARGV[$k+1]!~/^-/; be changed to $trf=$ARGV[$k+1] if /^-trf$/i and $ARGV[$k+1]!~/^-/; in cleanup.pl?

I noticed that when specifying perl ./bin/cleanup.pl -trf 1, the trf program is not executed.

oushujun commented 3 months ago

Hi @oushujun,

Should the line $trf=0 if /^-trf$/i and $ARGV[$k+1]!~/^-/; be changed to $trf=$ARGV[$k+1] if /^-trf$/i and $ARGV[$k+1]!~/^-/; in cleanup.pl?

I noticed that when specifying perl ./bin/cleanup.pl -trf 1, the trf program is not executed.

Good catch! You are correct. I have updated the code, and it will be pushed to GitHub in the next update. I don't think it's the solution of the initial post though.

Shujun