micom-dev / micom

Python package to study microbial communities using metabolic modeling.
https://micom-dev.github.io/micom
Apache License 2.0
82 stars 17 forks source link

"build" step stalling at 99% complete #165

Closed zoey-rw closed 2 months ago

zoey-rw commented 3 months ago

Problem description

After creating a custom database, the "build" command sometimes stalls at 99% when creating a manifest from 100+ samples and 16 taxa. If I open a new Python session and run the same build command, it occasionally creates the manifest successfully. I'm also encountering stalling at 99% complete with the fix_medium command (same dataset).

I'm using 28 cores so I don't think it's a lack of RAM, but would appreciate any ideas for debugging!

Code Sample

>>>build_database(tax, out_path="/projectnb/dietzelab/zrwerbin/N-cycle/data/MICOM/database", rank='genus', threads=28, compress=None, compresslevel=6, progress=True)
>>> soil_db = "/projectnb/dietzelab/zrwerbin/N-cycle/data/MICOM/database/"
>>> 
>>> manifest = build(tax, 
...                  out_folder="/projectnb/dietzelab/zrwerbin/N-cycle/data/MICOM/",
...                  model_db=soil_db, cutoff=0.00001, threads=28)
[12:00:26] WARNING  Found existing models for 102 samples. Will skip those. Delete the output folder if you would like me to rebuild them.     
[normal messages ...]
Set parameter TokenServer to value "sccsvc"
Set parameter GURO_PAR_SPECIAL
Set parameter TokenServer to value "sccsvc"
Read LP format model from file /scratch/5511857.1.geo-int/tmpcafsj2wj.lp
Reading time = 0.14 seconds
: 19665 rows, 44999 columns, 193299 nonzeros
Set parameter GURO_PAR_SPECIAL
Set parameter TokenServer to value "sccsvc"
Read LP format model from file /scratch/5511857.1.geo-int/tmptp3038rs.lp
Reading time = 0.12 seconds
: 17548 rows, 40257 columns, 173487 nonzeros
Read LP format model from file /scratch/5511857.1.geo-int/tmpjz3xnx_x.lp
Reading time = 0.14 seconds
: 19665 rows, 44999 columns, 193299 nonzeros
Set parameter GURO_PAR_SPECIAL
Set parameter TokenServer to value "sccsvc"
Read LP format model from file /scratch/5511857.1.geo-int/tmpmq1q5ngz.lp
Reading time = 0.13 seconds
: 19665 rows, 44999 columns, 193299 nonzeros
Read LP format model from file /scratch/5511857.1.geo-int/tmp5vkdrvvb.lp
Reading time = 0.13 seconds
: 19665 rows, 44999 columns, 193299 nonzeros
Running ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸  99% 0:00:01

Context

(micom)[zrwerbin@scc-ul2 ~]$ python -c "import micom; micom.show_versions()"
Package Information
-------------------
micom 0.33.2
Dependency Information
----------------------
cobra             0.29.0
highspy          missing
jinja2             3.1.3
osqp               0.6.3
scikit-learn 1.4.1.post1
scipy             1.12.0
symengine         0.11.0
Build Tools Information
-----------------------
pip          24.0
setuptools 69.1.1
wheel      0.42.0
Platform Information
--------------------
Linux   4.18.0-513.9.1.el8_9.x86_64-x86_64
CPython                             3.12.2

Other session info:

(micom)[zrwerbin@scc-ul2 ~]$ conda list
# packages in environment at /projectnb/talbot-lab-data/zrwerbin/.conda/envs/micom:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
annotated-types           0.6.0              pyhd8ed1ab_0    conda-forge
anyio                     4.3.0              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py312h98912ed_4    conda-forge
arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
babel                     2.14.0             pyhd8ed1ab_0    conda-forge
libcblas                  3.9.0           21_linux64_openblas    conda-forge
beautifulsoup4            4.12.3             pyha770c72_0    conda-forge
bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
brotli-python             1.1.0           py312h30efb56_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py312hf06ca03_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
cobra                     0.29.0             pyhd8ed1ab_0    conda-forge
comm                      0.2.1              pyhd8ed1ab_0    conda-forge
debugpy                   1.8.1           py312h30efb56_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
depinfo                   2.2.0              pyhd8ed1ab_0    conda-forge
diskcache                 5.6.3              pyhd8ed1ab_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
future                    1.0.0              pyhd8ed1ab_0    conda-forge
gf2x                      1.3.0                ha476b99_2    conda-forge
glpk                      5.0                  h445213a_0    conda-forge
gmp                       6.3.0                h59595ed_0    conda-forge
gurobi                    11.0.1                  py312_0    gurobi
h11                       0.14.0             pyhd8ed1ab_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
httpcore                  1.0.4              pyhd8ed1ab_0    conda-forge
httpx                     0.27.0             pyhd8ed1ab_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
importlib-metadata        7.0.1              pyha770c72_0    conda-forge
importlib_metadata        7.0.1                hd8ed1ab_0    conda-forge
importlib_resources       6.1.2              pyhd8ed1ab_0    conda-forge
ipykernel                 6.29.3             pyhd33586a_0    conda-forge
ipython                   8.22.2             pyh707e725_0    conda-forge
ipywidgets                8.1.2              pyhd8ed1ab_0    conda-forge
isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.3              pyhd8ed1ab_0    conda-forge
joblib                    1.3.2              pyhd8ed1ab_0    conda-forge
json5                     0.9.21             pyhd8ed1ab_0    conda-forge
jsonpointer               2.4             py312h7900ff3_3    conda-forge
jsonschema                4.21.1             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
jsonschema-with-format-nongpl 4.21.1             pyhd8ed1ab_0    conda-forge
jupyter                   1.0.0             pyhd8ed1ab_10    conda-forge
jupyter-lsp               2.2.4              pyhd8ed1ab_0    conda-forge
jupyter_client            8.6.0              pyhd8ed1ab_0    conda-forge
jupyter_console           6.6.3              pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.1           py312h7900ff3_0    conda-forge
jupyter_events            0.9.0              pyhd8ed1ab_0    conda-forge
jupyter_server            2.13.0             pyhd8ed1ab_0    conda-forge
jupyter_server_terminals  0.5.2              pyhd8ed1ab_0    conda-forge
jupyterlab                4.1.3              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.3.0              pyhd8ed1ab_1    conda-forge
jupyterlab_server         2.25.3             pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        3.0.10             pyhd8ed1ab_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libblas                   3.9.0           21_linux64_openblas    conda-forge
libcblas                  3.9.0           21_linux64_openblas    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflint                  2.9.0           h2f819a4_ntl_100    conda-forge
libgcc-ng                 13.2.0               h807b86a_5    conda-forge
libgfortran-ng            13.2.0               h69a702a_5    conda-forge
libgfortran5              13.2.0               ha4646dd_5    conda-forge
libgomp                   13.2.0               h807b86a_5    conda-forge
liblapack                 3.9.0           21_linux64_openblas    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.26          pthreads_h413a1c8_0    conda-forge
libosqp                   0.6.3                h6a678d5_0  
libqdldl                  0.1.7                hcb278e6_0    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libsqlite                 3.45.1               h2797004_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_5    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.5           py312h98912ed_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
micom                     0.33.2             pyhdfd78af_0    bioconda
mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.1                h9458935_0    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
nbclient                  0.8.0              pyhd8ed1ab_0    conda-forge
nbconvert                 7.16.2             pyhd8ed1ab_0    conda-forge
nbconvert-core            7.16.2             pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.16.2             pyhd8ed1ab_0    conda-forge
nbformat                  5.9.2              pyhd8ed1ab_0    conda-forge
ncurses                   6.4                  h59595ed_2    conda-forge
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
notebook                  7.1.1              pyhd8ed1ab_0    conda-forge
notebook-shim             0.2.4              pyhd8ed1ab_0    conda-forge
ntl                       11.4.3               hef3c4d3_1    conda-forge
numpy                     1.26.4          py312heda63a1_0    conda-forge
openssl                   3.2.1                hd590300_0    conda-forge
optlang                   1.8.1              pyhd8ed1ab_0    conda-forge
osqp                      0.6.3           py312hfb8ada1_2    conda-forge
overrides                 7.7.0              pyhd8ed1ab_0    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pandas                    2.2.1           py312hfb8ada1_0    conda-forge
pandoc                    3.1.12.2             ha770c72_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.2.0              pyhd8ed1ab_0    conda-forge
prometheus_client         0.20.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
prompt_toolkit            3.0.42               hd8ed1ab_0    conda-forge
psutil                    5.9.8           py312h98912ed_0    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pydantic                  2.6.3              pyhd8ed1ab_0    conda-forge
pydantic-core             2.16.3          py312h4b3b743_0    conda-forge
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.12.2          hab00c5b_0_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.19.1             pyhd8ed1ab_0    conda-forge
python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
python-libsbml            5.20.2          py312h30efb56_1    conda-forge
python-symengine          0.11.0          py312h83f29e1_1    conda-forge
python-tzdata             2024.1             pyhd8ed1ab_0    conda-forge
python_abi                3.12                    4_cp312    conda-forge
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.1           py312h98912ed_1    conda-forge
pyzmq                     25.1.2          py312h886d080_0    conda-forge
qdldl-python              0.1.7.post0     py312hfb8ada1_1    conda-forge
qtconsole-base            5.5.1              pyha770c72_0    conda-forge
qtpy                      2.4.1              pyhd8ed1ab_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
referencing               0.33.0             pyhd8ed1ab_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
rich                      13.7.1             pyhd8ed1ab_0    conda-forge
rpds-py                   0.18.0          py312h4b3b743_0    conda-forge
ruamel.yaml               0.18.6          py312h98912ed_0    conda-forge
ruamel.yaml.clib          0.2.8           py312h98912ed_0    conda-forge
scikit-learn              1.4.1.post1     py312h394d371_0    conda-forge
scipy                     1.12.0          py312heda63a1_2    conda-forge
send2trash                1.8.2              pyh41d4057_0    conda-forge
setuptools                69.1.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sniffio                   1.3.1              pyhd8ed1ab_0    conda-forge
soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
swiglpk                   5.0.10          py312h98912ed_0    conda-forge
symengine                 0.11.2               hb29318e_0    conda-forge
sympy                     1.12               pyh04b8f61_3    conda-forge
terminado                 0.18.0             pyh0d859eb_0    conda-forge
threadpoolctl             3.3.0              pyhc1e730c_0    conda-forge
tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
tornado                   6.4             py312h98912ed_0    conda-forge
traitlets                 5.14.1             pyhd8ed1ab_0    conda-forge
types-python-dateutil     2.8.19.20240106    pyhd8ed1ab_0    conda-forge
typing-extensions         4.10.0               hd8ed1ab_0    conda-forge
typing_extensions         4.10.0             pyha770c72_0    conda-forge
typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
webcolors                 1.13               pyhd8ed1ab_0    conda-forge
webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
websocket-client          1.7.0              pyhd8ed1ab_0    conda-forge
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
widgetsnbextension        4.0.10             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zeromq                    4.3.5                h59595ed_1    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge

The beginning of the tax file looks like this, with 1410 rows total:

>>> tax.head()
         genus       id                                               file     species  abundance                 sample_id    domain   kingdom          phylum                class            order            family
0  Azotobacter  iAA1300  /projectnb2/talbot-lab-data/metabolic_models/c...  vinelandii   0.011088  TOOL_002-O-20210804-COMP  Bacteria  Bacteria  Pseudomonadota  Gammaproteobacteria  Pseudomonadales  Pseudomonadaceae
1  Azotobacter  iAA1300  /projectnb2/talbot-lab-data/metabolic_models/c...  vinelandii   0.011444  TOOL_001-M-20170721-COMP  Bacteria  Bacteria  Pseudomonadota  Gammaproteobacteria  Pseudomonadales  Pseudomonadaceae
2  Azotobacter  iAA1300  /projectnb2/talbot-lab-data/metabolic_models/c...  vinelandii   0.019511  ORNL_029-O-20170621-COMP  Bacteria  Bacteria  Pseudomonadota  Gammaproteobacteria  Pseudomonadales  Pseudomonadaceae
3  Azotobacter  iAA1300  /projectnb2/talbot-lab-data/metabolic_models/c...  vinelandii   0.012908  HARV_001-M-20130709-COMP  Bacteria  Bacteria  Pseudomonadota  Gammaproteobacteria  Pseudomonadales  Pseudomonadaceae
4  Azotobacter  iAA1300  /projectnb2/talbot-lab-data/metabolic_models/c...  vinelandii   0.011425  ORNL_002-O-20170619-COMP  Bacteria  Bacteria  Pseudomonadota  Gammaproteobacteria  Pseudomonadales  Pseudomonadaceae
cdiener commented 3 months ago

Hi, I'm sorry you are experiencing this, it sure sounds frustrating. The multiprocessing module in Python can sometimes be a bit unstable. We sometimes see similar problems and they are often related to individuals processes generating a lot of log output that sometimes makes the thread pool unresponsive. The following might help (I would try those in that error):

  1. Make sure that build gets called in the main method, similar to the first example here.
  2. Switch out the multiprocessing method to "forkserver" or "spawn" with
    
    import multiprocessing

...

if name == "main": multiprocessing.set_start_method("spawn")

...


3. Run some of the later samples with threads=1 to see error messages that have been missed before.

For RAM you should plan with 1-2GB per thread, so in your case I would start by allocating 56GB and then reduce that based on actual use later.

Hopefully one of those will help.
zoey-rw commented 2 months ago

Thanks for these solutions - the first 2 seemed to help for the 16 taxa/100 samples situation, but the build process is still crashing when I scale up to ~50 taxa/600 samples.

Even with the Jupyter notebook flag set to "--NotebookApp.iopub_msg_rate_limit=1.0e10", or running from command line, the logging messages are still too frequent to see any progress bar updates. This is the type of warning that seems to be generating most of the log output:

WARNING  Reaction UF03564_E__pseudogymnoascus seems to be an exchange reaction but its ID does not start with 'EX_'...                            community.py:323

Is there a way to safely turn off logging for that warning, rather than changing all the reaction IDs? One of the following commands suppresses them when the parallel method is not "spawn", but the warnings show up anyways if multiprocessing.set_start_method("spawn", force=True)

import logging
logging.getLogger("micom.Community").setLevel(logging.ERROR)
logging.getLogger("micom").setLevel(logging.ERROR)
logging.getLogger("micom.logger").setLevel(logging.ERROR)

Nesting underneath the multiprocessing call doesn't suppress the warnings either:

if __name__ == '__main__':
    multiprocessing.set_start_method("spawn", force=True) 
    logging.getLogger("micom.Community").setLevel(logging.ERROR)
    logging.getLogger("micom").setLevel(logging.ERROR)
    logging.getLogger("micom.logger").setLevel(logging.ERROR)
    manifest = build(...)

The following approach worked for suppressing FutureWarnings from pandas, but I'm not sure if it can apply to the micom warnings:

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pandas as pd
zoey-rw commented 2 months ago

Okay, I think this works for reducing the logging while preserving the multiprocessing:

import logging
logging.getLogger("micom").setLevel(logging.ERROR)
logging.getLogger("micom.logger").setLevel(logging.ERROR)

if __name__ == '__main__':
    multiprocessing.set_start_method("spawn", force=True)
    logger = multiprocessing.log_to_stderr()
    logger.setLevel(logging.ERROR)
    manifest = build(...)

I will mark this as closed!

cdiener commented 2 months ago

Nice, thanks for investigating. Spawn will become the default method soon. I tested it and there is no notable performance hit. For the logging it should usually work with something like:

from micom.logger import logger
import logging

logger.setLevel(logging.ERROR)

All other modules just import that one logger. But I think this particular warning can be converted to a DEBUG message. The exchange reaction inference in cobrapy is sophisticated enough by now that this is probably not that serious. I will line that up for the next release.