oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

PanEDTA test output #454

Open AnaKurdadze opened 2 months ago

AnaKurdadze commented 2 months ago

Hi Oushujun,

Thank you for providing the EDTA.

I try to understand the output of panEDTA. Here's the command I executed from the ./test/README.txt file within the ../test directory:

nohup sh ../panEDTA.sh -g genome.cds.list -c genome.cds.fa -l ../database/athrep.updated.nonredun.fasta -t 20 -f 3 &

My understanding is as follows:

genome.cds.list.panEDTA.TElib.fa is the final pan-genome filtered library containing exemplar sequences for TE families across the pan-genome. Col.test.fa.mod.EDTA.TEanno.gff3 and Ler.test.fa.mod.EDTA.intact.gff3 are reannotations of the Col and Ler genomes using panEDTA.TElib.fa. I assume that Names= in the annotation file indicate the corresponding consTE used to annotate a particular copy in the genome. Therefore, all Names= in the annotation file should also be present in panEDTA.TElib.fa. However, this isn't the case. For example, the annotation of Col contains the copy ID=TE_struc_6;Name=TE_00000005, but the consTE sequence TE_00000005 is not in genome.cds.list.panEDTA.TElib.fa.

Could you please provide clarification on this output?

Best regards, Ana

2.0.1