Closed willnotburn closed 6 years ago
AFAIR, prokka by default excludes edge run-off CDs when calling prodigal. More importantly, prokka expects a single genome, and not a metagenome (unless metagenome mode was added in the last year or so?), and calls prodigal in "single" mode. Finally, and insignificantly, prokka will drop CDS overlapping xRNA.
With a few simple adjustments you should be able to get exactly the same results as from prodigal alone.
Thank you @spock for the excellent answer.
Prodigal in Prokka is using different options to you. Try --metagenome
option.
--metagenome Improve gene predictions for highly fragmented genomes (default OFF)
So, considering that PROKKA has the line:
Change it just removing '-c' should give the same number of genes in comparison with the PRODIGAL output?
I guess so, yes. Assuming same version of prodigal and same -g, -m, -p modes.
The --metagenome option does not result in the same behaviour as leaving out -c in the standalone prodigal version (tested with --kingdom Viruses). Is there an option to include this as a new feature in the next version of prokka? This would be instrumental in the annotation of scaffold from a fragmented metagenome.
I ran
PROKKA
andProdigal
(in an independent run) to detect CDS. The resultinggff
file generated byPROKKA
contains fewer than half the features than thegff
file fromProdigal
. I also manually checked out some contigs to confirm fewer features detected byPROKKA
. ThePROKKA
log shows it used the same version ofProdigal
I ran independently.PROKKA
: 5313886 CDS detectedProdigal
: 12330889 CDS detectedI am probably not understanding enough of under-the-hood of
PROKKA
. Does it impose additional quality filters to the ones default inProdigal
? Also (may be important): while I ranProdigal
on the entire assembly,PROKKA
would take too long, so I split up the assembly into about 80 smaller parts, ranPROKKA
on each split part separately, and then concatenated the results.