Closed 8Purplegrapes closed 4 months ago
You will need whole genome annotation for these genomes. If you provide the EDTA results alongside your genome paths, panEDTA will recognize these results and continue to generate the pan TE library.
Shujun
Hi oushujun, Your answer means: Is it to place the results of EDTA from 5 genomes in the same folder, and then run panEDTA.sh, which will automatically utilize the 5 TElib.fa to obtain the pan TE library? such as : One.genome.cds.fa One.genome.fa One.genome.fa.mod One.genome.fa.mod.EDTA.anno One.genome.fa.mod.EDTA.combine One.genome.fa.mod.EDTA.final One.genome.fa.mod.EDTA.intact.gff3 One.genome.fa.mod.EDTA.raw One.genome.fa.mod.EDTA.TEanno.gff3 One.genome.fa.mod.EDTA.TEanno.sum One.genome.fa.mod.EDTA.TElib.fa One.genome.fa.mod.MAKER.masked Two.genome.cds.fa Two.genome.fa Two.genome.fa.mod Two.genome.fa.mod.EDTA.anno Two.genome.fa.mod.EDTA.combine Two.genome.fa.mod.EDTA.final Two.genome.fa.mod.EDTA.intact.gff3 Two.genome.fa.mod.EDTA.raw Two.genome.fa.mod.EDTA.TEanno.gff3 Two.genome.fa.mod.EDTA.TEanno.sum Two.genome.fa.mod.EDTA.TElib.fa Two.genome.fa.mod.MAKER.masked Three... ...
(I want to directly use the library obtained from EDTA to build the pan TE library. )
Thanks very much.
Yes, putting them together will help. If not, specifying the path for each of the genomes and if the genome's path contains EDTA results, they will be utilized.
thank U !
Yes, I put the EDTA results of multiple genomes into a folder, then ran panEDTA.sh, and it is still running.
But I still have a question. My genome has some scaffold fragments, which are not information on the genome. If I delete them all and only keep the subgenome sequences, will it affect the final annotation?
If those fragments are not your interest, you may delete them. You may also want to check individual genomes' folders and see if they are progressing.
Shujun
~/anaconda3/envs/EDTA/share/EDTA/util/cleanup_nested.pl Is the library obtained by this perl script similar in nature to the panTElib.fa obtained by panEDTA?
yes and no. Yes, they are all produced by this script. No, the input sequences for this script are different. panEDTA uses a more stringent filter to keep off the false positives.
thank you! It seems that panEDTA is more effective. Is there a big difference between the results of running RepeatModeler with and without specifying it? If you specify to run RepeatModeler, how much longer will the running time usually be?
RepeatModeler tends to take a long time. And now RepeatModeler becomes the essential part of the pipeline and it's required to run in the raw step.
Shujun
If you encounter any errors, please update your panEDTA script and run again. Feel free to reopen the issue. Thanks!
Shujun
Hi oushujun, I have 5 genomes and annotated them with EDTA to obtain 5 corresponding TElib.fa files. Can I directly use these 5 TElib.fa files and panEDTA.sh to obtain panEDTATElib.fa?
Thanks very much.