raw-lab / MetaCerberus

Python code for versatile Functional Ontology Assignments for Metagenomes searching via Hidden Markov Model (HMM) with environmental focus of shotgun metaomics data
BSD 3-Clause "New" or "Revised" License
48 stars 7 forks source link

VOG hmm and tsv mismatch? WARNING, query not in lookup #27

Open mtisza1 opened 2 weeks ago

mtisza1 commented 2 weeks ago

Hi raw-lab,

More of a nitpick than a bug.

I did a fresh install of metacerberus v1.4.0 and downloaded all databases with the default

metacerberus.py --download

I ran metacerberus to annotate a bunch of metagenomic virus contigs.

I got a couple hundred error lines similar to:

WARNING, query not in lookup: UHGV-1497742_14 VOG VOG05340

I looked into the VOG releases to see about these VOG IDs. All the IDs I checked are in older VOG releases (v221) but not in newer releases. This is consistent with metacerberus distributing the older .hmm and the newer .tsv for the VOG DB.

Let me know if I'm onto something.

Best

Mike

raw-lab commented 2 weeks ago

Hi Mike,

So, we just updated to the new VOGdb. As the one we pulled was their latest version from 2016, that we couldn't find post their new update. They only had that version until recently.

We should be v225 from August with hmms and their tsv.

Hmm, it should be in the Lookup tsv right? Do you think it's v225 Lookup issue? As they just released v226 like 3 days ago. We can double check the parsing on our end. Can you send us a tsv of these errors - so we can double check to see if they are in the tsv/Lookup?

Thank you for noticing! Also, thank you for using MetaCerberus. We plan to include your hmms for portal, tail, and mcp shortly.

We have a cryptic phage that is complete but we can't find the mcp or portal? We haven't tried your hmms yet but we plan on it.

Let's us know your thoughts if you think it's a release issue or a parsing issue on our end?

Many thanks, RAW lab