oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

panEDTA for metazoans #435

Closed adeoliveira86 closed 4 months ago

adeoliveira86 commented 4 months ago

Dear @oushujun,

I would like to thank you for the great tool. I have been using EDTA to annotate highly repetitive metazoan genomes and I am really happy with the results. I am finishing running panEDTA and I would like to obtain some insights into the evolution of the repeat families across my selected animal targets, similar to the analyses you described in your pre-print https://www.biorxiv.org/content/10.1101/2022.10.09.511471v1.

One detail which I am still struggling to link with traditional pangenome analyses is what elements in the panEDTA library are present in in all species, and which ones are only present in few (shell) or within single individuals (cloud). Based on the publication, it looks like to me that the "panEDTA" library file, cannot be interpreted as a traditional pangenome analysis outcome, because panEDTA actually filters and removes repeat elements identified in the different species (as described in the section below) following some criteria. This decreases the total number of elements which theoretically would be found with the union of all individual EDTA repeat libraries. Coming back to my analogy, the panEDTA library is most likely the representation of the "core" in a traditional pangenome pipeline. Is my interpretation correct?

I am asking this because I would like to add some evolutionary perspective into my analyses and trace back how the repeat families evolved across the lineage I am interested and how this impacted these animal's genome evolution. Is this possible to be done with the panEDTA results? I checked the git page associated with your publication and it is incredibly rich (git clone https://github.com/oushujun/PopTEvo.git) with many distinct scripts. Most likely one of those could be used to help achieving my goals.

Again thanks again for the incredible tool, Best, André

oushujun commented 4 months ago

Hi Andre,

Thank you for using EDTA and panEDTA! Yes you are correct, that the library is just a representation of high-confident TE sequences. You may check out the TEanno.sum file of each genome which contains the presence of each family and their size. Many codes in the panEDTA GitHub is parsing that file type. You may also find information helpful in the EDTA wiki page.

Best, Shujun

On Wed, Feb 21, 2024 at 3:53 AM adeoliveira86 @.***> wrote:

Dear @oushujun https://github.com/oushujun,

I would like to thank you for the great tool. I have been using EDTA to annotate highly repetitive metazoan genomes and I am really happy with the results. I am finishing running panEDTA and I would like to obtain some insights into the evolution of the repeat families across my selected animal targets, similar to the analyses you described in your pre-print https://www.biorxiv.org/content/10.1101/2022.10.09.511471v1.

One detail which I am still struggling to link with traditional pangenome analyses is what elements in the panEDTA library are present in in all species, and which ones are only present in few (shell) or within single individuals (cloud). Based on the publication, it looks like to me that the "panEDTA" library file, cannot be interpreted as a traditional pangenome analysis outcome, because panEDTA actually filters and removes repeat elements identified in the different species (as described in the section below) following some criteria. This decreases the total number of elements which theoretically would be found with the union of all individual EDTA repeat libraries. Coming back to my analogy, the panEDTA library is most likely the representation of the "core" in a traditional pangenome pipeline. Is my interpretation correct?

I am asking this because I would like to add some evolutionary perspective into my analyses and trace back how the repeat families evolved across the lineage I am interested and how this impacted these animal's genome evolution. Is this possible to be done with the panEDTA results? I checked the git page associated with your publication and it is incredibly rich (git clone https://github.com/oushujun/PopTEvo.git) with many distinct scripts. Most likely one of those could be used to help achieving my goals.

Again thanks again for the incredible tool, Best, André

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/435, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NGDLTBDZREQGFOMD73YUWYYFAVCNFSM6AAAAABDST4YKKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DMMJVGQZTGNA . You are receiving this because you were mentioned.Message ID: @.***>

adeoliveira86 commented 4 months ago

Thanks for the help @oushujun!

adeoliveira86 commented 4 months ago

Hello @oushujun,

I am getting some errors while running panEDTA and I am a bit lost on how to fix them. Is there any compatibility issues between the EDTA output results v2.0.1 and the running of panEDTA.sh available at the version 2.2.0?

Best, André

oushujun commented 4 months ago

You need to update and rerun EDTA on 2.2.1, which has a version of panEDTA compatible. Your 2.0.1 results are not compatible. Sorry for the inconvenience.

Shujun

On Fri, Feb 23, 2024 at 9:30 AM adeoliveira86 @.***> wrote:

Hello @oushujun https://github.com/oushujun,

I am getting some errors while running panEDTA and I am a bit lost on how to fix them. Is there any compatibility issues between the EDTA output results v2.0.1 and the running of panEDTA.sh available at the version 2.2.0?

Best, André

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/435#issuecomment-1961436133, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NF7SNVUODJ4M5Z7O6TYVCRXJAVCNFSM6AAAAABDST4YKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRGQZTMMJTGM . You are receiving this because you were mentioned.Message ID: @.***>

adeoliveira86 commented 4 months ago

Oh, that’s unfortunate. Bad timing for me. Running EDTA on my target sequences took many weeks. Thanks @oushujun, I will see how to proceed.

Cheers, André