Closed YiJessePi closed 5 years ago
@YiJessePi
do you know what species/genus your genome is? if so, just download some complete Genbank genomes and use the --proteins
option:
https://github.com/tseemann/prokka/blob/master/README.md#option---proteins
having 70% hypothetical is not unusual in less studied genomes. Even classic E.coli K12 has many unknown genes with 4 letter gene codes beginning with yXXX
.
you could use PGAP instead and get NCBI quality annotations. google 'ncbi pgap'
Hi, I find high fraction of hypothetical proteins (~70%) using prokka when annotating my genome. Although tools like eggnog mapper succeeded to annotate higher fraction, of proteins using them is irrelevant since their long run time.
Is there a way to reduce the hypothetical proteins rate with similar run-time (same order of magnitude)? maybe like a adding additional db? do you have a recommended one?
[This make me wonder how prokka runs so fast while it blast proteins against 3 DBs and use HMM?...]