oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
346 stars 73 forks source link

questions about the outputs #286

Closed XinyiLiuLMU closed 2 years ago

XinyiLiuLMU commented 2 years ago

Hello Dr. Ou, Thank you for the tool and tutorials! I have some questions about the outputs.

In the colcen.fna.mod.LTR.intact.gff3 file, it showed that 1662 intact LTRs were detected (same number for colcen.fna.mod.pass.list.gff3). However, when I checked colcen.fna.mod.pass.list to see the LTR insertion time, it only has 237 lines. I tried to link the LTRs in colcen.fna.mod.pass.list to the LTRs in colcen.fna.mod.LTR.intact.gff3 by transposon names, and I found one name corresponds to multiple lines in colcen.fna.mod.LTR.intact.gff3. Like this:

CP096024.1      EDTA    repeat_region        4410313 4415437 .       ?       .       ID=repeat_region_2;Name=CP096024.1:4410318..4415432;Classification=LTR/unknown;Sequence_ontology=SO:0000657;ltr_identity=0.9691;Method=structural;motif=TGCA;tsd=ATTAA
CP096024.1      EDTA    target_site_duplication 4410313 4410317 .       ?       .       ID=lTSD_2;Parent=repeat_region_2;Name=CP096024.1:4410318..4415432;Classification=LTR/unknown;Sequence_ontology=SO:0000434;ltr_identity=0.9691;Method=structural;motif=TGCA;tsd=ATTAA
CP096024.1      EDTA    long_terminal_repeat    4410318 4410479 .       ?       .       ID=lLTR_2;Parent=repeat_region_2;Name=CP096024.1:4410318..4415432;Classification=LTR/unknown;Sequence_ontology=SO:0000286;ltr_identity=0.9691;Method=structural;motif=TGCA;tsd=ATTAA
CP096024.1      EDTA    LTR_retrotransposon     4410318 4415432 .       ?       .       ID=LTRRT_2;Parent=repeat_region_2;Name=CP096024.1:4410318..4415432;Classification=LTR/unknown;Sequence_ontology=SO:0000186;ltr_identity=0.9691;Method=structural;motif=TGCA;tsd=ATTAA
CP096024.1      EDTA    long_terminal_repeat    4415271 4415432 .       ?       .       ID=rLTR_2;Parent=repeat_region_2;Name=CP096024.1:4410318..4415432;Classification=LTR/unknown;Sequence_ontology=SO:0000286;ltr_identity=0.9691;Method=structural;motif=TGCA;tsd=ATTAA
CP096024.1      EDTA    target_site_duplication 4415433 4415437 .       ?       .       ID=rTSD_2;Parent=repeat_region_2;Name=CP096024.1:4410318..4415432;Classification=LTR/unknown;Sequence_ontology=SO:0000434;ltr_identity=0.9691;Method=structural;motif=TGCA;tsd=ATTAA

Are they nested transposons? What is the reason for that?

Also, I cannot link the LTRs in colcen.fna.mod.EDTA.TEanno.gff3 to the LTRs in colcen.fna.mod.pass.list. Because in the final annotation file there are more LTRs annotated (Is this because RepeatModeler identified some LTRs missed by other programs? ) and the names have been substituted by the sequence IDs like TE_00000158_INT. I cannot find the LTR insertion time simply using the sequence IDs or positions. Do you maybe know any method to solve this?

Thank you again for your time.

Best regards, Xinyi

oushujun commented 2 years ago

Hi Xinyi,

Please read the wiki for more details.

Shujun