oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
330 stars 72 forks source link

duplicate sequences with the same sequence but different id #187

Closed khjia closed 3 years ago

khjia commented 3 years ago

In the final annotation file *EDTA.intact.gff3., a few elements with the same sequence but different id were found, for example:

image

oushujun commented 3 years ago

These are most likely intact LTR retrotransposons that carry alternative structural features, ie. LTR region, TSD, motifs, etc. This happens because multiple search tools are used (LTRharvest and LTR_FINDER) and LTR candidates can overlap or duplicated. I opted to retain all qualified candidates if they are slightly different. For your case, the two elements have the same coordinates for the LTR region but different TSDs. You may remove repeat 1862 because the other has a perfect TSD.

The sum file won't overcount these features. The gff3 file is for visualization purposes.