oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
336 stars 73 forks source link

TE sequence derived from mod.EDTA.intact.gff3 is different with the corespond TE from mod.EDTA.TElib.fa #342

Closed qjiangzhao closed 1 year ago

qjiangzhao commented 1 year ago

Hi Shujun,

Here is the code I used for my EDTA execution.

perl ~/EDTA.pl / --genome $My_Genome / --cds $My_cds / --exclude $My_bed / --overwrite 0 --sensitive 1 --anno 1 --evaluate 1 --threads 40

My problem is: I found the sequences in the file mod.EDTA.TElib.fa can't represent the full length of those intact retrotransposons. For example, I found an intact retrotransposon, named TE_0000681, from file mod.EDTA.intact.gff3 thorough IGV. I extracted the TE_00000681 sequence from IGV and compared it with the same ID from file mod.EDTA.TElib.fa and found they were different.

Could I ask how you get the intact retrotransposon sequence of mod.EDTA.TElib.fa?

Yours sincerely Jiangzhao Qian

oushujun commented 1 year ago

Hello Jiangzhao,

You may extract coordinates of intact TEs from mod.EDTA.intact.gff3 and extract intact TE sequences based on these coordinates.

Best, Shujun

qjiangzhao commented 1 year ago

Many thanks ;)

oushujun commented 1 year ago

You can see more discussions in #352. The library file attempts to remove nested TE, so if the intact LTR is nested, the library sequence may not entirely represent the nested element.

Shujun