Closed YuboWang1994 closed 10 months ago
Hello,
Please check out panEDTA for this purpose.
Thanks! Shujun
On Mon, Aug 7, 2023 at 12:18 AM YuboWang1994 @.***> wrote:
Hi,
I truly appreciate your development of the EDTA pipeline, which has been immensely helpful to me. However, I have a small issue that I would like to trouble you with.
I am currently researching four plant genomes from the same Family. I have annotated repetitive sequences using the EDTA pipeline for each genome and obtained the corresponding non-redundant TE library. My next research step is comparing compositional differences in repetitive sequences within the sex-determining regions of these four genomes, such as copy numbers and sequence compositions. However, since each genome was annotated separately with EDTA, the four non-redundant TE libraries might contain the same TE IDs like TE_00000000, making direct comparisons challenging.
Considering that these four species share a common Family and exhibit high sequence and gene collinearity, I've designed the following approach:
- Classify each non-redundant TE library based on sequence types, such as TIR, helitron, LTR-gypsy, LTR-copia, LTR-unknown, etc.
- Merge TE sequences of the same type from the four genomes, remove redundancy using CD-HIT, and rename sequence IDs.
- Perform subsequent analyses using the new IDs, such as comparing copy number and compositional differences of the same TE across different species.
Does this sequence similarity-based approach for merging non-redundant TE libraries seem scientifically make sense?
I'm appreciate your help.
— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/376, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NFYRPNYGXQPFW5EPI3XUBUABANCNFSM6AAAAAA3GM3BNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi,
Thanks for your help. It seems work.
Best Yubo Wang
Hi,
I truly appreciate your development of the EDTA pipeline, which has been immensely helpful to me. However, I have a small issue that I would like to trouble you with.
I am currently researching four plant genomes from the same Family. I have annotated repetitive sequences using the EDTA pipeline for each genome and obtained the corresponding non-redundant TE library. My next research step is comparing compositional differences in repetitive sequences within the sex-determining regions of these four genomes, such as copy numbers and sequence compositions. However, since each genome was annotated separately with EDTA, the four non-redundant TE libraries might contain the same TE IDs like TE_00000000, making direct comparisons challenging.
Considering that these four species share a common Family and exhibit high sequence and gene collinearity, I've designed the following approach:
Does this sequence similarity-based approach for merging non-redundant TE libraries seem scientifically make sense?
I'm appreciate your help.