oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

panEDTA #412

Closed 8Purplegrapes closed 4 months ago

8Purplegrapes commented 6 months ago

Hi oushujun, I have 5 genomes and annotated them with EDTA to obtain 5 corresponding TElib.fa files. Can I directly use these 5 TElib.fa files and panEDTA.sh to obtain panEDTATElib.fa?

Thanks very much.

oushujun commented 6 months ago

You will need whole genome annotation for these genomes. If you provide the EDTA results alongside your genome paths, panEDTA will recognize these results and continue to generate the pan TE library.

Shujun

8Purplegrapes commented 6 months ago

Hi oushujun, Your answer means: Is it to place the results of EDTA from 5 genomes in the same folder, and then run panEDTA.sh, which will automatically utilize the 5 TElib.fa to obtain the pan TE library? such as : One.genome.cds.fa One.genome.fa One.genome.fa.mod One.genome.fa.mod.EDTA.anno One.genome.fa.mod.EDTA.combine One.genome.fa.mod.EDTA.final One.genome.fa.mod.EDTA.intact.gff3 One.genome.fa.mod.EDTA.raw One.genome.fa.mod.EDTA.TEanno.gff3 One.genome.fa.mod.EDTA.TEanno.sum One.genome.fa.mod.EDTA.TElib.fa One.genome.fa.mod.MAKER.masked Two.genome.cds.fa Two.genome.fa Two.genome.fa.mod Two.genome.fa.mod.EDTA.anno Two.genome.fa.mod.EDTA.combine Two.genome.fa.mod.EDTA.final Two.genome.fa.mod.EDTA.intact.gff3 Two.genome.fa.mod.EDTA.raw Two.genome.fa.mod.EDTA.TEanno.gff3 Two.genome.fa.mod.EDTA.TEanno.sum Two.genome.fa.mod.EDTA.TElib.fa Two.genome.fa.mod.MAKER.masked Three... ...

(I want to directly use the library obtained from EDTA to build the pan TE library. )

Thanks very much.

oushujun commented 5 months ago

Yes, putting them together will help. If not, specifying the path for each of the genomes and if the genome's path contains EDTA results, they will be utilized.

8Purplegrapes commented 5 months ago

thank U !

Yes, I put the EDTA results of multiple genomes into a folder, then ran panEDTA.sh, and it is still running.

But I still have a question. My genome has some scaffold fragments, which are not information on the genome. If I delete them all and only keep the subgenome sequences, will it affect the final annotation?

oushujun commented 5 months ago

If those fragments are not your interest, you may delete them. You may also want to check individual genomes' folders and see if they are progressing.

Shujun

8Purplegrapes commented 5 months ago

~/anaconda3/envs/EDTA/share/EDTA/util/cleanup_nested.pl Is the library obtained by this perl script similar in nature to the panTElib.fa obtained by panEDTA?

oushujun commented 5 months ago

yes and no. Yes, they are all produced by this script. No, the input sequences for this script are different. panEDTA uses a more stringent filter to keep off the false positives.

8Purplegrapes commented 5 months ago

thank you! It seems that panEDTA is more effective. Is there a big difference between the results of running RepeatModeler with and without specifying it? If you specify to run RepeatModeler, how much longer will the running time usually be?

oushujun commented 5 months ago

RepeatModeler tends to take a long time. And now RepeatModeler becomes the essential part of the pipeline and it's required to run in the raw step.

Shujun

oushujun commented 4 months ago

If you encounter any errors, please update your panEDTA script and run again. Feel free to reopen the issue. Thanks!

Shujun