Open willnotburn opened 7 years ago
I think all the information you need is in the output GBK and GFF files?
You can look at each CDS
(column 3) feature in the .gff
file and extract the ID=xxxxx
(column 9) and the contig is in column 1.
If you have access to the KEGG ortholog database (I assume it is not free/open anymore?) then you can create a custom Prokka database and provide it via --proteins
and even annotate the KO
directly.
This is for annotation with KOs of a metagenomic project, sampled from many locations. I am looking to use the workflow: assembled contigs -> Prokka -> proteins WITH reference to original contigs -> GhostKOALA -> KOs
The goal is a table of KOs in rows, samples in columns, with KO abundances in each sample populating the table.
For annotation with KOs, GhostKOALA takes amino acid sequences, presumably with protein headers. Prokka outputs translated CDS with headers in .faa file. Perfect! But to get abundances of KOs in samples, I need to know where the proteins come from i.e. which contig(s). The contig abundances in samples are calculated via mapping in a separate step...
Does Prokka output info that connects proteins in .faa file with original input contigs?