oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
336 stars 73 forks source link

Issues related to the interpretation of the results of TEanno.gff3 and TElib.fa #334

Open Song-10-YF opened 1 year ago

Song-10-YF commented 1 year ago

Hello! In TEanno.gff3, TE_00000019 actually has multi-locus repeats on each chromosome, so do these TE_00000019 need to be merged and spliced again? Or do these transposons already contain the full LTR structural domain and are independent LTR elements? What is the relationship between these TE_00000019 sequences and TElib.fa? Also I found inconsistent annotation results in TEanno.gff3, how can I fix it?

My command. --species others --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --curatedlib chr16.TElib.fa --threads 30

Doubtful results are as follows: 31715:Chr01 EDTA LTR_retrotransposon 27330977 27331387 2175 - . ID=TE_homo_30935;Name=TE_00000019;Classification=LTR/unknown;Sequence_ontology=SO:0000186;Identity=0.912;Method=homology 35455:Chr01 EDTA LINE_element 29695752 29695931 513 - . ID=TE_homo_34622;Name=TE_00000019;Classification=LINE/unknown;Sequence_ontology=SO:0000194;Identity=0.751;Method=homology 35666:Chr01 EDTA LTR_retrotransposon 29823217 29823369 335 + . ID=TE_homo_34833;Name=TE_00000019;Classification=LTR/unknown;Sequence_ontology=SO:0000186;Identity=0.796;Method=homology

Best, Song

oushujun commented 1 year ago

Inconsistency needs to be fixed in the library level, so that whole-genome annotations can be consistent. It seems naming of different repeat type is a bit buggy, I will leave this open and check later.