Closed mdhishamshaikh closed 1 month ago
Good afternoon Hisham,
Thank you for your kind words and use of MetaCerberus.
VOGDB recently updated everything. It was not updated since 2017. Then boom updated!
We have been unable to find the version we used here as it has be wiped or moved.
We will update VOGDB to the newest version shortly.
We take the best hit for the 'final annotation file,' but we provide all the individual outputs for you to decide which one was best.
We leave it up to you to decide what is best - based on your best thoughts on what database is providing the best readout for you.
It appears from the example you gave us here it has DNA pol I domain, a helicase domain, and potentially an 3'-5' exonuclease domain. You can look at PFAM/TIGRfams/PGFams to check it. Of course, is virion non-structural protein it's a replicase for DNA. ;-)
Many proteins are multiple domain especially DNA replicases. So, we leave it up to you to figure out what multidomain proteins are called.
We will leave this open for now and let you know when we replace VOGDB
many thanks, RAW Lab
Hey, thank you for your promptness! It's good to know that there is indeed an issue with VOGDB. I will look forward to the update and thank you for your help with the decision making :)
Cheers, Hisham
Good afternoon,
Thank you for using MetaCerberus and your kind words! Tell your friends. ;-)
VOG had a major update. See here. https://vogdb.org/
The older metadata from VOG database against INPHRED in order to have less hypothetical proteins. As the last update prior to this was back in 2017.
We are using the metadata list and update 225 from VOG currently.
We have updated MetaCerberus 1.4 with VOGdb 225 If you have already have 1.4 to get the new database you just run:
conda activate metacerberus
metacerberus.py --update
This will update your databases only. Please upgrade to 1.4 if you haven't already. Lots of upgrades and faster processing times with HydraMPP.
many thanks, RAW Lab
Close for now. Let us know if you need anything?
Hello!
First of all, MetaCerberus is a great and very convenient tool to scan through multiple databases for functional annotation! Kudos to you all! I've been using MetaCerberus to annotate geNomad-identified viral proteins from a metagenomic survey. I have a few clarifications/questions regarding the decision tree. Considering that we get hits from multiple databases and after following the decision tree, you assign a best hit per target in your final output file. It is completely possible that this best hit annotation might not have a proper description. For example, GVDB might have the best hit but its annotation could be "no annotation" while KEGG or other runner-up hits might be able to assign an appropriate description to it. So, instead of taking the best hit, I would take the best hit with an appropriate description. I could of course do the same with the top5 files just make sure I am not missing on something. Would this be a valid approach? I also plan to identify non-descriptive terms per database to try and automate this a bit.
Secondly, in an attempt to find second or third best hits, I concatenated outputs of all the database hits into a single file. More often than not, the VOG hits differed from rest of the database hits.
There are plenty of VOG hits that are considered best and are in the final file. At this point, I am concerned that there is some error in connecting
I checked the VOGDB annotation summary tsv from their website and for the descriptions for IDs between MetaCerberus and VOGDB do not match.
Perhaps, there's an issue with the metadata file for VOGDB in MetaCerberus?
Looking forward to hearing from you!
Cheers, Hisham