tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
822 stars 224 forks source link

problem with fasta header starting with '>0 ' #567

Open gtonkinhill opened 3 years ago

gtonkinhill commented 3 years ago

Hi,

I've searched through the issues so hopefully this hasn't been mentioned before. It seems that Prokka run's into problems when a fasta header starts with >0.

In this case it renames the sequence header as SEQ in the annotations of the output gff file but does not rename the sequence in the fasta section of the gff file. This can lead to downstream programs skipping these annotations.

I've copied an example below and attached the corresponding input fasta file along with the output gff file.

##gff-version 3
##sequence-region 0 1 526811
##sequence-region 1 1 500965
SEQ     Prodigal:002006 CDS     25      351     .       +       0       ID=AAJEMFJL_00001;inference=ab initio prediction:Prodigal:002006;locus_tag=AAJEMFJL_00001;product=unannotated protein
SEQ     Prodigal:002006 CDS     409     747     .       +       0       ID=AAJEMFJL_00002;inference=ab initio prediction:Prodigal:002006;locus_tag=AAJEMFJL_00002;product=unannotated protein
SEQ     Prodigal:002006 CDS     753     2168    .       -       0       ID=AAJEMFJL_00003;inference=ab initio prediction:Prodigal:002006;locus_tag=AAJEMFJL_00003;product=unannotated protein
.
.
.
.
##FASTA
>0
TACAACCTGCTGTTGGTGTCGCGTATGAAAGAAGAGCTGGGTGCCGGTATCAATACGGGC
ATCATTCGAGCGATGGGTGGGACCGGCAAAGTGGTCACCTCGGCGGGTCTGGTCTTCGCG

I'm using Prokka v1.14.6 and ran the command prokka --noanno 11861_7#10.fa

test.zip