tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
809 stars 222 forks source link

inconsistency in prokka annotation #521

Open vappiah opened 3 years ago

vappiah commented 3 years ago

Hi @tseemann I observed an inconsistency when annotating a genome with prokka.

First I downloaded the M. ulcerans Agy99 genome ( fasta and genebank format).

I performed annotation on the fasta file specifying the genebank as proteins with the command below.
prokka --cpus $threads --kingdom Bacteria --prefix Agy99 --genus Mycobacterium --rfam --species ulcerans --cdsrnaolap --metagenome --proteins Agy99.gb Agy99.fasta

I counted the CDS in the original genebank file and that of the prokka generated genebank. The original had 4160 whiles the prokka file had 8665. Am i missing something in the command ? I was expecting similar values for both but the difference is very large. Please advice. Thanks

felipelira commented 3 years ago

Dear vappiah,

Did you check if this difference between numbers means that your new annotation presents pseudo genes or fragmented genes?

Cheers

vappiah commented 3 years ago

Dear @felipelira They are fragmented genes.

felipelira commented 3 years ago

In this manner, how many genes you get when filtering the fragmented genes? Is it close to the expected 4160 genes? Another question. Why do you use the option "metagenome"?