oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

Should I rewrite the previous results ? #406

Open tongyin121 opened 6 months ago

tongyin121 commented 6 months ago

Hi, I have annotated my genomes by using the panEDTA pipeline, and now I have to add some new genomes. I am curious about that should I overwrite the previous EDTA results and rerun the panEDTA pipeline or just rerun the panEDTA pipeline?

Best regards

oushujun commented 6 months ago

Hello,

If you have backed up the previous EDTA results, you can just reuse those and add your extra genome. If you have panEDTA overwritten your EDTA results, you may want to rerun EDTA on each genome plus your new genome. You don't need to rerun the raw step and the filter step, so you can start with --step final for those existing genomes.

Best, Shujun

tongyin121 commented 6 months ago

Hi, Thanks for your help. Best, tongyin

tongyin121 commented 5 months ago

Hi, After rerunning EDTA with my all genome, I find the results id quite different. Some important TE families in the PanEDTA library which can be found before have not been found yet. Can I merge the results of these two libraries?

oushujun commented 5 months ago

The TE IDs will change every time you rerun the program.

Shujun

On Thu, Jan 11, 2024 at 1:25 AM tongyin121 @.***> wrote:

Reopened #406 https://github.com/oushujun/EDTA/issues/406.

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/406#event-11452598345, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NF5OCILJUSR54ODSVTYN6HVHAVCNFSM6AAAAABAF47POGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGQ2TENJZHAZTINI . You are receiving this because you commented.Message ID: @.***>

tongyin121 commented 5 months ago

Hi, you may misunderstand what I mean because of my ambiguous expression. The TE IDs will change but the sequence will not change. Therefore, I use the TE family that interests me which comes from the old library as the query to blast with the new TE library, and the output file is empty.

tongyin121 commented 5 months ago

Hi, I checked the results before and after, I found that the first panEDTA annotated the repeats as MITE/DTM, but the second time, the panEDTA annotated the repeats as Gypsy_LTR_retrotransposon. The result is pretty paradoxical, the previous result which annotated the repeats as MITE/DTM has annotated the TIR and TSD. But the newer result did not find a corresponding Gypsy_LTR retrotransposon structure. Could you please help? This is very important to me.

oushujun commented 5 months ago

If you have TEs that are curated or particular families interested in your study, you should provide it to EDTA via --curatedlib. panEDTA filters out low-copy families to control false positives, so families shown in the EDTA annotation may not be found in the panEDTA annotation simply because they are low-copy.

There are moderate levels of misannotation in the current version of EDTA. The new update (v2.2.0) has a better filter for intact TEs, you may try it out.