microbiomedata / metaMAGs

Workflow for metagenome assembled genomes generation.
5 stars 4 forks source link

silent error during call-mbin_nmdc #19

Closed aclum closed 6 months ago

aclum commented 6 months ago

I noticed that none of our mags workflow results generate any bins, even projects for which we know the JGI processing generated mags (ie nmdc:wfmgan-11-585bp531.1, the JGI run of this data generated 27 medium and high quality mags, see img taxon oid 3300061644). I find errors in the call-mbin_nmdc stage 'ERROR: Models must be parsed before identifying HMM hits.' A quick search suggests an issue with the checkM installation https://github.com/Ecogenomics/CheckM/issues/280

This issue is not caught b/c the last process of the task completes correctly so we should also set pipefail to catch these types of problems going forward.

I can find 400 records on pscratch, this likely doesn't include deleted projects and the issue dates back to at least early August 2023 /pscratch/sd/n/nmdcda/cromwell-executions/nmdc_mags> grep 'ERROR: Models must be parsed before identifying HMM hits' */call-mbin_nmdc/execution/stdout | wc -l 600

cc @mbthornton-lbl

Michal-Babins commented 6 months ago

The only thing I can think of of in this instance is to use the latest version of checkm and update the docker image.

aclum commented 6 months ago

What version is Neha using?

Michal-Babins commented 6 months ago

From what I got from her is 1.2.1. But again, this was from a zip file I got of the files. There is no repo and no wdl wrapper.

chienchi commented 6 months ago

The version is coded in a python script. https://github.com/microbiomedata/metaMAGs/blob/master/Docker/mbin_versions.py

versions = {
        "mbin.py" : "0.5",
        "metabat2" : "2.15",
        "checkm-genome" : "1.2.1",
        "gtdb-tk" : "2.1.1",
        "hmmer" : "3.3.2",
        "prodigal" : "2.6.3",
        "pplacer" : "1.1.alpha19",
        "fasttree" : "2.1.11",
        "fastANI" : "1.33",
        "mash" : "2.3",
        "sqlite" : "3.39.2",
        "Python" : "3.9.12"
    }
Michal-Babins commented 6 months ago

Version 1.2.2 of checkm seems to be functional, but that would take us out of sync with JGI.