oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
312 stars 69 forks source link

Benchmarking not as expected. #373

Open isabelladistefano opened 10 months ago

isabelladistefano commented 10 months ago

Dear Shujun,

I hope you are well. When reading your benchmarking paper, “Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline” EDTA appears to do very well on TE prediction in the rice genome. 

For the purpose of our studies, we are benchmarking some TE tools including EDTA. We compare the output of EDTA to the the Published TAIR Transposable Elements of Arabidopsis thaliana chromosome 1.

This was the code for EDTA, the FASTA being the most recent TAIR genome assembly of Arabidopsis thaliana. perl $EDTA --genome $FASTA --cds $CDS --anno 1 --threads 32 --sensitive 1

https://www.arabidopsis.org/ - TAIR publishes 7135 Transposable elements in Arabidopsis thaliana Chromosome 1 

When intersecting the EDTA results with the TAIR results using

bedtools intersect -u -a TAIR_TEs.gff -b EDTA.anno.gff

There are only 3462 intersections, meaning the EDTA result is only representing 48.5% of the transposable elements in Arabidopsis thaliana chromsome 1.

 This is before looking at whether the classes/families are correct so far.

Please can you help us to find an explanation for this and/or improve the efficiency of EDTA so that we can use it to safely annotate TEs of other non-model brassicaceae species.



Best wishes,



Isabella

oushujun commented 10 months ago

Hi Isabella,

Thanks for trying out EDTA. Did you run EDTA on only chr1 or the entire genome?

Thanks, Shujun

On Fri, Jul 28, 2023 at 9:46 AM isabelladistefano @.***> wrote:

Dear Shujun,

I hope you are well. When reading your benchmarking paper, “Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline” EDTA appears to do very well on TE prediction in the rice genome. For the purpose of our studies, we are benchmarking some TE tools including EDTA. We compare the output of EDTA to the the Published TAIR Transposable Elements of Arabidopsis thaliana chromosome 1.

This was the code for EDTA, the FASTA being the most recent TAIR genome assembly of Arabidopsis thaliana. perl $EDTA --step anno --genome $FASTA --cds $CDS --anno 1 --threads 32 --sensitive 1 --evaluate 1

TAIR publishes 7135 Transposable elements in Arabidopsis thaliana Chromosome 1 When intersecting the EDTA results with the TAIR results using

bedtools intersect -u -a TAIR_TEs.gff -b EDTA.anno.gff

There are only 3462 intersections, meaning the EDTA result is only representing 48.5% of the transposable elements in Arabidopsis thaliana chromsome 1. Please can you help us to find an explanation for this and/or improve the efficiency of EDTA so that we can use it to safely annotate TEs of other non-model brassicacea species.

Best wishes,

Isabella

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/373, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NGIVA2YTBPU6EBRJ6DXSO7BZANCNFSM6AAAAAA23Q3GAQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

isabelladistefano commented 10 months ago

Dear Shujun,

The whole Genome, then extracted chromosome 1.

Best wishes,

Isabella

isabelladistefano commented 9 months ago

Hello Shujun, Any comments on my findings?

Best wishes,

Isabella

oushujun commented 9 months ago

Hi Isabella,

THanks for your feedback. I am benchmarking on Arabidopsis and will check out this case.

Shujun

isabelladistefano commented 9 months ago

Hello,

Any luck?

Best wishes,

Isabella

oushujun commented 4 months ago

Hi Isabella,

Sorry for the long delay. I evaluated the TAIR10 TE annotation and found the quality is not as high as expected. Still, I doubt the overlap between the two annotations is less than half. Can you please share with me the link to download your TAIR10 annotation?

Thanks, Shujun