oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
337 stars 73 forks source link

About TIR annotation #186

Closed MrbrilliantLL closed 3 years ago

MrbrilliantLL commented 3 years ago

Hi Shujun,

I have annotated the B73 maize genome with EDTA v1.8.5 and get the following results:

截屏2021-04-19 下午11 55 15

However, there is a big gap between this result and the result shown in the TIR-learner article (TIR-Learner, a New Ensemble Method for TIR Transposable Element Annotation, Provides Evidence for Abundant New Transposable Elements in the Maize Genome). I want to know where the problem caused these differences.

截屏2021-04-19 下午11 42 52

Thank you for your help!

Lei

oushujun commented 3 years ago

Hi Lei,

The original paper is TIR-Learner beta v1 or v0, which is a prototype to prove the machine learning approach. The program has been involved to v2.5 during the development of EDTA. We trained another and hopefully better model for different species in this version. please use the current version if possible.

Best, Shujun

On Mon, Apr 19, 2021 at 11:50 PM MrbrilliantLL @.***> wrote:

Hi Shujun,

I have annotated the B73 maize genome with EDTA v1.8.5 and get the following results: Class Count bpMasked %masked ===== ===== ======== ======= DNA -- -- -- DTA 63762 24946115 1.15% DTC 153297 70620354 3.24% DTH 39234 10102037 0.46% DTM 181054 58121698 2.67% DTT 37761 8339279 0.38% Helitron 454248 124093424 5.70% LTR -- -- -- Copia 586588 410001646 18.82% Gypsy 946215 856308364 39.31% unknown 485033 246303377 11.31% MITE -- -- -- DTA 26807 6888013 0.32% DTC 2090 352390 0.02% DTH 51320 8825137 0.41% DTM 15644 2117127 0.10% DTT 10686 1420099 0.07%

total interspersed 3053739 1828439060 83.94%

However, there is a big gap between this result and the result shown in the TIR-learner article (TIR-Learner, a New Ensemble Method for TIR Transposable Element Annotation, Provides Evidence for Abundant New Transposable Elements in the Maize Genome). I want to know where the problem caused this differences. [image: 截屏2021-04-19 下午11 42 52] https://user-images.githubusercontent.com/56568825/115265335-b4aff300-a169-11eb-8d1c-a44dbdb90759.png

Thank you for your help!

Lei

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/186, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NCCM6HTLVZDWACR7RLTJRGM3ANCNFSM43GDKJ7A .

MrbrilliantLL commented 3 years ago

Dear Shujun, Thank you for your quick response! Now, I have two questions about the final output files.

I am working on a genome where some TIR elements are active. I know that “ *.fa.mod.EDTA.TEanno.gff ” contains the annotation of the fragments of the intact transposons. However, the newly inserted transposons should be intact in the genome. How can I find the intact TIR elements that recur?

From the " *.fa.mod.EDTA.intact.gff ", I have observed that most DNA/MITE TEs have 10bps TIR. Is the length of these TIRs determined? Or only the 10bp TIR is shown in the file. If so, how can I annotate the complete TIR for these DNA/MITE TEs.

Best wishes, Lei

oushujun commented 3 years ago

Hi Lei,

*.fa.mod.EDTA.TEanno.gff3 contains both intact and fragmented TEs. Look for the tag method=structural|homology in the last column of the file. "structural" means intact. Other tags contain TIR, TSD, and identity info which are the complete features founded by TIR-Learner but not necessarily 100% accurate.

Best, Shujun

MrbrilliantLL commented 3 years ago

Hi Shujun,

I have found the intact TIR elements by the 'Divide and conquer' described function. Thank you again for your help!

Best wishes, Lei