oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

an error in Calculate LAI from EDTA #158

Closed chaimol closed 10 months ago

chaimol commented 10 months ago
          I am follow the [https ://github.com/oushujun/EDTA/wiki/Calculate-LAI-from-EDTA-GFF3-files](https://github.com/oushujun/EDTA/wiki/Calculate-LAI-from-EDTA-GFF3-files) to caculate the LAI.

But the LAI value is big more than 100. I run the command like this:

##从EDTA的输出结果计算LAI
genome="genome.fa" #这个是用于EDTA计算的基因组文件的名称
EDTA_path="/share/home/chai/soft/EDTA" #EDTA的安装路径
LAI="/share/home/chai/soft/LTR_retriever/LTR_retriever/LAI" #LAI的绝对路径 
threads=16

for genome in ${genome}; do perl ${EDTA_path}/util/gff2bed.pl $genome.mod.EDTA.TEanno.gff3 structural > $genome.mod.EDTA.TEanno.struc.bed ; done
for genome in ${genome}; do grep LTR $genome.mod.EDTA.TEanno.struc.bed|grep struc|awk '{print $1":"$2".."$3"\t"$7}' > $genome.mod.EDTA.TEanno.LTR.pass.list ; done
for genome in ${genome}; do perl -nle 'my ($chr, $s, $e, $anno, $dir, $supfam)=(split)[0,1,2,3,8,12]; print "10000 0.001 0.001 0.001 $chr $s $e NA $dir $anno $supfam"' $genome.mod.EDTA.TEanno.struc.bed > $genome.out.EDTA.TEanno.out ; done
for genome in ${genome}; do ${LAI} -genome $genome -intact $genome.mod.EDTA.TEanno.LTR.pass.list -all $genome.out.EDTA.TEanno.out -q -t ${threads} ; done

In the end ,The output info as follow:

######################################
### LTR Assembly Index (LAI) beta3.2 ###
######################################

Developer: Shujun Ou

Please cite:

Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730: https://doi.org/10.1093/nar/gky730

Parameters: -genome genome.fa -intact genome.fa.mod.EDTA.TEanno.LTR.pass.list -all genome.fa.out.EDTA.TEanno.out -q -t 16

Tue Sep  5 09:02:15 CST 2023    Dependency checking: Passed!
Tue Sep  5 09:02:15 CST 2023    Calculation of LAI will be based on the whole genome.
                                Please use the -mono parameter if your genome is a recent ployploid, otherwise high identity between LTR homeologues will overcorrect raw LAI scores and result in low LAI.
Tue Sep  5 09:02:15 CST 2023    Estimate the identity of LTR sequences in the genome: quick mode
Warning: LOC list genome.fa.out.EDTA.TEanno.out.LAI.LTRlist is empty.
genome.fa.out.EDTA.TEanno.out.LAI.LTR.fa is empty, please check the genome.fa file and the genome.fa.out.EDTA.TEanno.out.LAI.LTRlist file
Tue Sep  5 09:02:16 CST 2023    The identity of LTR sequences: %
Argument "" isn't numeric in numeric lt (<) at /share/home/chai/soft/LTR_retriever/LAI line 143.

                                【Warning】 The identity drops below the safe limit. Instead, identity of 92% will be used for LAI adjustment.

Tue Sep  5 09:02:16 CST 2023    Calculate LAI:

                                                Done!

Tue Sep  5 09:02:20 CST 2023    Result file: genome.fa.out.EDTA.TEanno.out.LAI

                                You may use either raw_LAI or LAI for intraspecific comparison
                                but please use ONLY LAI for interspecific comparison

The output file genome.fa.out.EDTA.TEanno.out.LAI.LTR.fa and genome.fa.out.EDTA.TEanno.out.LAI.LTRlist were empty. and the LAI file LAI value is more than 100. head genome.fa.out.EDTA.TEanno.out.LAI

whole_genome    1       118716139       0.0740  0.0740  100.00  105.63
A01     1       3000000 0.0350  0.0350  100.00  105.63
A01     300001  3300000 0.0368  0.0368  100.00  105.63
A01     600001  3600000 0.0452  0.0452  100.00  105.63
A01     900001  3900000 0.0454  0.0454  100.00  105.63
chaimol commented 10 months ago

I have check the code find the two file were empty , the reason is the EDTA results file "$genome.out.EDTA.TEanno.out" lack of "_LTR", just have "LTR".

So I modify the https://github.com/oushujun/LTR_retriever/blob/master/bin/Age_est.pl file the line48 from next unless /_LTR/; to next unless /LTR/;, and then I rerun the command

##从EDTA的输出结果计算LAI
genome="genome.fa" #这个是用于EDTA计算的基因组文件的名称
EDTA_path="/share/home/chai/soft/EDTA" #EDTA的安装路径
LAI="/share/home/chai/soft/LTR_retriever/LTR_retriever/LAI" #LAI的绝对路径 
threads=16

for genome in ${genome}; do perl ${EDTA_path}/util/gff2bed.pl $genome.mod.EDTA.TEanno.gff3 structural > $genome.mod.EDTA.TEanno.struc.bed ; done
for genome in ${genome}; do grep LTR $genome.mod.EDTA.TEanno.struc.bed|grep struc|awk '{print $1":"$2".."$3"\t"$7}' > $genome.mod.EDTA.TEanno.LTR.pass.list ; done
for genome in ${genome}; do perl -nle 'my ($chr, $s, $e, $anno, $dir, $supfam)=(split)[0,1,2,3,8,12]; print "10000 0.001 0.001 0.001 $chr $s $e NA $dir $anno $supfam"' $genome.mod.EDTA.TEanno.struc.bed > $genome.out.EDTA.TEanno.out ; done
for genome in ${genome}; do ${LAI} -genome $genome -intact $genome.mod.EDTA.TEanno.LTR.pass.list -all $genome.out.EDTA.TEanno.out -q -t ${threads} ; done

The output info as follow:

######################################
### LTR Assembly Index (LAI) beta3.2 ###
######################################

Developer: Shujun Ou

Please cite:

Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730: https://doi.org/10.1093/nar/gky730

Parameters: -genome genome.fa -intact genome.fa.mod.EDTA.TEanno.LTR.pass.list -all genome.fa.out.EDTA.TEanno.out -q -t 16

Tue Sep  5 10:40:41 CST 2023    Dependency checking: Passed!
Tue Sep  5 10:40:41 CST 2023    Calculation of LAI will be based on the whole genome.
                                Please use the -mono parameter if your genome is a recent ployploid, otherwise high identity between LTR homeologues will overcorrect raw LAI scores and result in low LAI.
Tue Sep  5 10:40:41 CST 2023    Estimate the identity of LTR sequences in the genome: quick mode
Tue Sep  5 10:42:21 CST 2023    The identity of LTR sequences: 95.0119673202615%
Tue Sep  5 10:42:21 CST 2023    Calculate LAI:

                                                Done!

Tue Sep  5 10:42:24 CST 2023    Result file: genome.fa.out.EDTA.TEanno.out.LAI

                                You may use either raw_LAI or LAI for intraspecific comparison
                                but please use ONLY LAI for interspecific comparison

but I check the LAI file 'head genome.fa.out.EDTA.TEanno.out.LAI'

whole_genome    1       118716139       0.0740  0.0740  100.00  97.15
A01     1       3000000 0.0350  0.0350  100.00  97.15
A01     300001  3300000 0.0368  0.0368  100.00  97.15
A01     600001  3600000 0.0452  0.0452  100.00  97.15
A01     900001  3900000 0.0454  0.0454  100.00  97.15

the LAI value still were too large.

Can anyone explain what could be causing the LAI value to be too large?

oushujun commented 10 months ago

Please don't submit duplicated questions. Please check out #138.