Question about the total interspersed between RepeatMasker and EDTA.

oushujun / EDTA

Extensive de-novo TE Annotator

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y

GNU General Public License v3.0

346 stars 73 forks source link

Question about the total interspersed between RepeatMasker and EDTA. #293

Closed Morriyaty closed 1 year ago

Morriyaty commented 2 years ago

I want to annotate an animal genome. First I created a pan-TE library manually according to this article (https://doi.org/10.1093/molbev/msac080). Then I run RepeatMasker (sensitive mode) and the result is this [pic1]: But the EDTA gives me such result [pic2] : EDTA annotate 37.06% repeat sequences but RepeatMasker annotate ~43% sequences. Then I look for the RepeatMasker results contained in EDTA[AL-1.chr.final.fasta.mod.tbl,quick mode], it annotate 41% [pic3]. It seems two RepeatMasker's results get similar results, but EDTA get low percentage. I wonder what's the point? By the way, my command is EDTA.pl --genome /data/01/user186/666..genome/AL-1.chr.final.fasta --curatedlib /data/01/user156/wyj/02.genome.TE/new_id.fa --anno 1 -t 40 -u 3.03e-9

Bests, Yinjia

oushujun commented 2 years ago

Hi Yinjia,

If I understand correctly, the third annotation is generated using EDTA's library by RepeatMasker (with -q). Basically the final annotation is merging the homology annotation (RepeatMasker) with structural annotation (EDTA), and with homology annotated entries overlapping with structurally annotated entries removed. So there should be similar levels of interspersed repeats after the merge. However, the summary file should be generated by different scripts. EDTA is using a modified version of summary script from RepeatMasker, and located EDTA/util/buildSummary.pl, you may use this script to summarize the other two RepeatMasker results and see if they change.

Best, Shujun

Morriyaty commented 2 years ago

Hi Shujun

I run buildSummary.pl for one of the RepeatMasker .out result and the output file is shown below. GY.buildSummary.txt It has similar results. So, if I understand correctly, the results differences between EDTA and RepeatMasker are caused by EDTA filter step. Is that right?

Bests, Yinjia

oushujun commented 2 years ago

EDTA has short annotations (≤80bp) filtered. This may be the cause.

Shujun

Morriyaty commented 2 years ago

Hi:

I got it, thanks!

Bests Yinjia