genome.cds.list.panEDTA.TElib.fa is the final pan-genome filtered library containing exemplar sequences for TE families across the pan-genome.
Col.test.fa.mod.EDTA.TEanno.gff3 and Ler.test.fa.mod.EDTA.intact.gff3 are reannotations of the Col and Ler genomes using panEDTA.TElib.fa.
I assume that Names= in the annotation file indicate the corresponding consTE used to annotate a particular copy in the genome. Therefore, all Names= in the annotation file should also be present in panEDTA.TElib.fa.
However, this isn't the case. For example, the annotation of Col contains the copy ID=TE_struc_6;Name=TE_00000005, but the consTE sequence TE_00000005 is not in genome.cds.list.panEDTA.TElib.fa.
Could you please provide clarification on this output?
Hi Oushujun,
Thank you for providing the EDTA.
I try to understand the output of panEDTA. Here's the command I executed from the ./test/README.txt file within the ../test directory:
nohup sh ../panEDTA.sh -g genome.cds.list -c genome.cds.fa -l ../database/athrep.updated.nonredun.fasta -t 20 -f 3 &
My understanding is as follows:
genome.cds.list.panEDTA.TElib.fa is the final pan-genome filtered library containing exemplar sequences for TE families across the pan-genome. Col.test.fa.mod.EDTA.TEanno.gff3 and Ler.test.fa.mod.EDTA.intact.gff3 are reannotations of the Col and Ler genomes using panEDTA.TElib.fa. I assume that Names= in the annotation file indicate the corresponding consTE used to annotate a particular copy in the genome. Therefore, all Names= in the annotation file should also be present in panEDTA.TElib.fa. However, this isn't the case. For example, the annotation of Col contains the copy ID=TE_struc_6;Name=TE_00000005, but the consTE sequence TE_00000005 is not in genome.cds.list.panEDTA.TElib.fa.
Could you please provide clarification on this output?
Best regards, Ana
2.0.1