Some databases seems outdated a long time ago

liubovch commented 5 years ago

Hi,

First of all, thanks for this great tool! It makes life easier!

My question is about default genus-specific DBs. They seem outdated as the last changes were made 5 years ago. The same is with HAMAP hmm profiles. I think they should be updated more often. Another option could be to include a warning for users so that they know about this problem and could consider building their own DBs.

tseemann commented 5 years ago

Thank you
Genus DBs - read https://github.com/tseemann/prokka#the-genus-databases

The ones I include are highly curated ones myself and my colleagues performed. It's much simpler to use --proteins your_annotations.gbk these days - I recommend downloading the latest favourite annotation from Refseq and providing that to Prokka.

There is a prokka-hamap_to_hmm tool included. But HAMAP keep changing their formats. Please realise that these databases don't change as much as you might think. Our understanding of new protein functions is quite slow and incremental.

I update the sprot database regularly. This usually captures 50-70% of all annotations. The HAMAP is a secondary source. My HAMAP is Feb 19 2017 HAMAP.hmm. I have this targetted for updating. Thank you for the reminder.

tseemann commented 5 years ago

Also, if you want really high quality annotations that are NCBI Refseq standard, you can use PGAP now:

https://github.com/ncbi/pgap/wiki

It's slower but does a really good job. I wrote Prokka originally because PGAP (formerly PGAAP) was not open source, but that version is now :-)

liubovch commented 5 years ago

I am sorry, I overlooked somehow your recent changes in README about genus DBs! Thank you a lot for your comments and suggestions!

tseemann / prokka

Some databases seems outdated a long time ago #426