opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
467 stars 218 forks source link

SBML parsing fails on invalid GPRs #1394

Closed tamascogustavo closed 5 months ago

tamascogustavo commented 5 months ago

Is there an existing issue for this?

Problem description

what I was trying to load a SBML mode found on this paper: https://academic.oup.com/plphys/article/165/3/1380/6113226?login=false. The data is on the Supp.2

how I first tried to load the model, since i was getting the error, i decided to use the validate.

from cobra.io.sbml import validate_sbml_model

model, errors = validate_sbml_model(model_path.get("AraGem2015"))

if errors:
    for error in errors:
        print(error)
else:
    print("The model is valid.")

This is the captured error:

Uppercase AND/OR found in rule '2*(ATCG00020 AND ATCG00680 AND ATCG00280 AND ATCG00270 AND ATCG00580 AND ATCG00570 AND ATCG00710 AND ATCG00080 AND ATCG00550 AND ATCG00070 AND ATCG00560 AND ATCG00220 AND ATCG00700 AND (AT5G66570 OR AT3G50820) AND AT1G06680 AND (AT4G21280 OR AT4G05180) AND AT1G79040 AND AT1G44575 AND ATCG00690 AND AT3G21055 AND AT2G30570 AND AT2G06520 AND AT1G67740 AND ATCG00300)'.

I also tried to validate the sbml, and nothing was detected.

Solution I downgraded to version 0.22.1 of cobrapy and it was possible to load the model, without any issues.

Code sample

from cobra.io.sbml import validate_sbml_model

model, errors = validate_sbml_model(model_path.get("AraGem2015"))

if errors:
    for error in errors:
        print(error)
else:
    print("The model is valid.")

at_model = cobra.io.read_sbml_model(model_path.get("Arnold2014"))
at_model

Arnold2014.xml.zip

Environment

name: GSMR channels: - conda-forge - bioconda - defaults dependencies: - absl-py=2.1.0=pyhd8ed1ab_0 - ampl-mp=3.1.0=h2beb688_1006 - appnope=0.1.4=pyhd8ed1ab_0 - arviz=0.17.1=py39hecd8cb5_0 - asttokens=2.4.1=pyhd8ed1ab_0 - atk-1.0=2.38.0=h4bec284_2 - aws-c-auth=0.7.22=h26aba2d_2 - aws-c-cal=0.6.14=hb0e519c_1 - aws-c-common=0.9.19=hfdf4475_0 - aws-c-compression=0.2.18=hb0e519c_6 - aws-c-event-stream=0.4.2=hc5e814a_12 - aws-c-http=0.8.1=ha6e9f73_17 - aws-c-io=0.14.8=hf69683f_5 - aws-c-mqtt=0.10.4=h76e2169_4 - aws-c-s3=0.5.9=hd10324c_3 - aws-c-sdkutils=0.1.16=hb0e519c_2 - aws-checksums=0.1.18=hb0e519c_6 - aws-crt-cpp=0.26.9=h473fab1_0 - aws-sdk-cpp=1.11.329=h6b2b1af_3 - backcall=0.2.0=pyh9f0ad1d_0 - blackjax=1.2.1=pyhd8ed1ab_0 - brotli=1.1.0=h0dc2134_1 - brotli-bin=1.1.0=h0dc2134_1 - bzip2=1.0.8=h10d778d_5 - c-ares=1.28.1=h10d778d_0 - ca-certificates=2024.6.2=h8857fd0_0 - cached-property=1.5.2=hd8ed1ab_1 - cached_property=1.5.2=pyha770c72_1 - cachetools=5.3.3=pyhd8ed1ab_0 - cairo=1.18.0=h99e66fa_0 - certifi=2024.2.2=pyhd8ed1ab_0 - chex=0.1.86=pyhd8ed1ab_0 - click=8.1.7=unix_pyh707e725_0 - cloudpickle=3.0.0=pyhd8ed1ab_0 - coin-or-cbc=2.10.10=he49632c_0 - coin-or-cgl=0.60.7=ha3c4b8c_0 - coin-or-clp=1.17.8=hf0ee74e_0 - coin-or-osi=0.108.10=h13a241d_0 - coin-or-utils=2.11.11=h86ddba1_0 - coincbc=2.10.10=0_metapackage - colorama=0.4.6=pyhd8ed1ab_0 - comm=0.2.2=pyhd8ed1ab_0 - cons=0.4.6=pyhd8ed1ab_0 - contourpy=1.2.1=py39h0ca7971_0 - cppad=20240000.4=h73e2aa4_0 - cycler=0.12.1=pyhd8ed1ab_0 - debugpy=1.8.1=py39hd253f6c_0 - decorator=5.1.1=pyhd8ed1ab_0 - etuples=0.3.9=pyhd8ed1ab_0 - executing=2.0.1=pyhd8ed1ab_0 - expat=2.6.2=h73e2aa4_0 - fastprogress=1.0.3=pyhd8ed1ab_0 - filelock=3.14.0=pyhd8ed1ab_0 - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 - font-ttf-inconsolata=3.000=h77eed37_0 - font-ttf-source-code-pro=2.038=h77eed37_0 - font-ttf-ubuntu=0.83=h77eed37_2 - fontconfig=2.14.2=h5bb23bf_0 - fonts-conda-ecosystem=1=0 - fonts-conda-forge=1=0 - fonttools=4.51.0=py39ha09f3b3_0 - freetype=2.12.1=h60636b9_2 - fribidi=1.0.10=hbcb3906_0 - gdk-pixbuf=2.42.11=ha9f1606_0 - gettext=0.22.5=h5ff76d1_2 - gettext-tools=0.22.5=h5ff76d1_2 - gflags=2.2.2=hb1e8313_1004 - giflib=5.2.2=h10d778d_0 - glog=0.7.0=h31b1b29_0 - gmp=6.3.0=h73e2aa4_1 - graphite2=1.3.13=h73e2aa4_1003 - graphviz=2.50.0=h8671558_4 - gtk2=2.24.33=h8ca4665_4 - gts=0.7.6=h53e17e3_4 - h5netcdf=1.3.0=pyhd8ed1ab_0 - h5py=3.11.0=nompi_py39hac72f59_101 - harfbuzz=8.4.0=h72fa137_0 - hdf5=1.14.3=nompi_h687a608_103 - icu=73.2=hf5e326d_0 - importlib-metadata=7.1.0=pyha770c72_0 - importlib-resources=6.4.0=pyhd8ed1ab_0 - importlib_metadata=7.1.0=hd8ed1ab_0 - importlib_resources=6.4.0=pyhd8ed1ab_0 - ipopt=3.14.16=h024ff17_2 - ipykernel=6.29.3=pyh3cd1d5f_0 - ipython=8.12.0=pyhd1c38e8_0 - jax=0.4.27=pyhd8ed1ab_0 - jaxlib=0.4.23=cpu_py39hf5b6a1b_2 - jaxopt=0.8.3=pyhd8ed1ab_0 - jedi=0.19.1=pyhd8ed1ab_0 - joblib=1.4.2=pyhd8ed1ab_0 - jupyter_core=4.12.0=py39h6e9494a_0 - kiwisolver=1.4.5=py39h8ee36c8_1 - krb5=1.21.2=hb884880_0 - lcms2=2.16=ha2f27b4_0 - lerc=4.0.0=hb486fe8_0 - libabseil=20240116.2=cxx17_hc1bcbd7_0 - libaec=1.1.3=h73e2aa4_0 - libarrow=16.1.0=h0870315_6_cpu - libarrow-acero=16.1.0=hf036a51_6_cpu - libarrow-dataset=16.1.0=hf036a51_6_cpu - libarrow-substrait=16.1.0=h85bc590_6_cpu - libasprintf=0.22.5=h5ff76d1_2 - libasprintf-devel=0.22.5=h5ff76d1_2 - libblas=3.9.0=22_osx64_openblas - libbrotlicommon=1.1.0=h0dc2134_1 - libbrotlidec=1.1.0=h0dc2134_1 - libbrotlienc=1.1.0=h0dc2134_1 - libcblas=3.9.0=22_osx64_openblas - libcrc32c=1.1.2=he49afe7_0 - libcurl=8.8.0=hf9fcc65_0 - libcxx=17.0.6=h88467a6_0 - libdeflate=1.20=h49d49c5_0 - libedit=3.1.20191231=h0678c8f_2 - libev=4.33=h10d778d_2 - libevent=2.1.12=ha90c15b_1 - libexpat=2.6.2=h73e2aa4_0 - libffi=3.4.2=h0d85af4_5 - libgd=2.3.3=h0dceb68_9 - libgettextpo=0.22.5=h5ff76d1_2 - libgettextpo-devel=0.22.5=h5ff76d1_2 - libgfortran=5.0.0=13_2_0_h97931a8_3 - libgfortran5=13.2.0=h2873a65_3 - libglib=2.80.2=h0f68cf7_0 - libgoogle-cloud=2.24.0=h721cda5_0 - libgoogle-cloud-storage=2.24.0=ha1c69e0_0 - libgrpc=1.62.2=h384b2fc_0 - libhwloc=2.10.0=default_h1321489_1000 - libiconv=1.17=hd75f5a5_2 - libintl=0.22.5=h5ff76d1_2 - libintl-devel=0.22.5=h5ff76d1_2 - libjpeg-turbo=3.0.0=h0dc2134_1 - liblapack=3.9.0=22_osx64_openblas - liblapacke=3.9.0=22_osx64_openblas - libllvm14=14.0.6=hc8e404f_4 - libnghttp2=1.58.0=h64cf6d3_1 - libopenblas=0.3.27=openmp_hfef2a42_0 - libparquet=16.1.0=h904a336_6_cpu - libpng=1.6.43=h92b6c6a_0 - libprotobuf=4.25.3=h4e4d658_0 - libre2-11=2023.09.01=h81f5012_2 - librsvg=2.58.0=h7b06fc5_1 - libscotch=7.0.4=hc2ac6e5_1 - libsodium=1.0.18=hbcb3906_1 - libsqlite=3.45.3=h92b6c6a_0 - libssh2=1.11.0=hd019ec5_0 - libthrift=0.19.0=h064b379_1 - libtiff=4.6.0=h129831d_3 - libutf8proc=2.8.0=hb7f2c08_0 - libwebp=1.4.0=hc207709_0 - libwebp-base=1.4.0=h10d778d_0 - libxcb=1.15=hb7f2c08_0 - libxml2=2.12.7=h3e169fe_0 - libzlib=1.2.13=h8a1eda9_5 - llvm-openmp=18.1.5=h39e0ece_0 - logical-unification=0.4.6=pyhd8ed1ab_0 - lz4-c=1.9.4=hf0c8a7f_0 - matplotlib-base=3.8.4=py39h7070ae8_0 - matplotlib-inline=0.1.7=pyhd8ed1ab_0 - matplotlibhelper=0.0.10=py_1 - metis=5.1.0=he965462_1007 - minikanren=1.0.3=pyhd8ed1ab_0 - ml_dtypes=0.4.0=py39hbb604f3_1 - mpfr=4.2.1=h4f6b447_1 - mumps-include=5.7.1=h694c41f_0 - mumps-seq=5.7.1=hce5b6e2_0 - munkres=1.1.4=pyh9f0ad1d_0 - ncurses=6.5=h5846eda_0 - nest-asyncio=1.6.0=pyhd8ed1ab_0 - numba=0.59.1=py39hb7f44fa_0 - numpy=1.26.4=py39h28c39a1_0 - numpyro=0.15.0=pyhd8ed1ab_0 - nutpie=0.10.0=py39hf01fb67_0 - openjpeg=2.5.2=h7310d3a_0 - openssl=3.3.0=h87427d6_3 - opt-einsum=3.3.0=hd8ed1ab_2 - opt_einsum=3.3.0=pyhc1e730c_2 - optax=0.2.2=pyhd8ed1ab_0 - orc=2.0.1=hf43e91b_1 - packaging=24.0=pyhd8ed1ab_0 - pandas=2.2.2=py39haf03413_0 - panflute=2.1.3=pyhd8ed1ab_1 - pango=1.52.2=h7f2093b_0 - parso=0.8.4=pyhd8ed1ab_0 - pcre2=10.43=h0ad2156_0 - pexpect=4.9.0=pyhd8ed1ab_0 - pickleshare=0.7.5=py_1003 - pillow=10.3.0=py39h9dabb2a_0 - pip=24.0=pyhd8ed1ab_0 - pixman=0.43.4=h73e2aa4_0 - prompt-toolkit=3.0.42=pyha770c72_0 - prompt_toolkit=3.0.42=hd8ed1ab_0 - psutil=5.9.8=py39ha09f3b3_0 - pthread-stubs=0.4=hc929b4f_1001 - ptyprocess=0.7.0=pyhd3deb0d_0 - pure_eval=0.2.2=pyhd8ed1ab_0 - pyarrow=16.1.0=py39hbd905a8_1 - pyarrow-core=16.1.0=py39h8665caa_1_cpu - pygments=2.18.0=pyhd8ed1ab_0 - pygraphviz=1.9=py39h6c40b1e_1 - pymc=5.6.1=py39h9dd2307_0 - pynndescent=0.5.12=pyhca7485f_0 - pyparsing=3.1.2=pyhd8ed1ab_0 - pyscipopt=5.0.1=py39hd253f6c_0 - python=3.9.19=h7a9c478_0_cpython - python-tzdata=2024.1=pyhd8ed1ab_0 - python_abi=3.9=4_cp39 - pytz=2024.1=pyhd8ed1ab_0 - pyyaml=6.0.1=py39hdc70f33_1 - pyzmq=26.0.3=py39h304b177_0 - re2=2023.09.01=hb168e87_2 - readline=8.2=h9e318b2_1 - scip=9.0.1=hc288958_0 - scotch=7.0.4=h52a132a_1 - setuptools=69.5.1=pyhd8ed1ab_0 - six=1.16.0=pyh6c4a22f_0 - snappy=1.2.0=h6dc393e_1 - stack_data=0.6.2=pyhd8ed1ab_0 - sugartex=0.1.16=py_0 - tbb=2021.12.0=h7728843_0 - threadpoolctl=3.5.0=pyhc1e730c_0 - tk=8.6.13=h1abcd95_1 - toolz=0.12.1=pyhd8ed1ab_0 - tornado=6.4=py39ha09f3b3_0 - tqdm=4.66.4=pyhd8ed1ab_0 - traitlets=5.14.3=pyhd8ed1ab_0 - typing-extensions=4.12.1=hd8ed1ab_0 - typing_extensions=4.12.1=pyha770c72_0 - tzdata=2024a=h0c530f3_0 - unicodedata2=15.1.0=py39hdc70f33_0 - unixodbc=2.3.12=he8a5cf4_0 - wcwidth=0.2.13=pyhd8ed1ab_0 - wheel=0.43.0=pyhd8ed1ab_1 - xarray=2024.5.0=pyhd8ed1ab_0 - xarray-einstats=0.7.0=pyhd8ed1ab_0 - xorg-libxau=1.0.11=h0dc2134_0 - xorg-libxdmcp=1.1.3=h35c211d_0 - xz=5.2.6=h775f41a_0 - yaml=0.2.5=h0d85af4_2 - zeromq=4.3.5=h8d87b8b_3 - zlib=1.2.13=h8a1eda9_5 - zstd=1.5.6=h915ae27_0 - pip: - alembic==1.13.1 - annotated-types==0.6.0 - anyio==4.3.0 - appdirs==1.4.4 - argon2-cffi==23.1.0 - argon2-cffi-bindings==21.2.0 - atomicwrites==1.4.1 - attrs==23.2.0 - beautifulsoup4==4.12.3 - bleach==6.1.0 - bokeh==3.4.1 - catboost==1.2.5 - cffi==1.16.0 - charset-normalizer==3.3.2 - cobra==0.29.0 - colorcet==3.1.0 - colorlog==6.8.2 - dask==2024.5.1 - datashader==0.16.2 - datatable==1.1.0 - defusedxml==0.7.1 - depinfo==2.2.0 - diskcache==5.6.3 - entrypoints==0.4 - escher==1.7.3 - et-xmlfile==1.1.0 - exceptiongroup==1.2.1 - fastjsonschema==2.19.1 - fsspec==2024.5.0 - future==1.0.0 - greenlet==3.0.3 - h11==0.14.0 - holoviews==1.18.3 - httpcore==1.0.5 - httpx==0.27.0 - idna==3.7 - imageio==2.34.1 - ipython-genutils==0.2.0 - ipywidgets==7.8.1 - jinja2==2.11.3 - jsonpickle==3.0.4 - jsonschema==3.2.0 - jupyter-client==7.4.9 - jupyter-events==0.6.3 - jupyter-server==2.10.0 - jupyter-server-terminals==0.5.3 - jupyterlab-pygments==0.3.0 - jupyterlab-widgets==1.1.7 - lazy-loader==0.4 - lightgbm==4.3.0 - linkify-it-py==2.0.3 - llvmlite==0.42.0 - locket==1.0.0 - mako==1.3.5 - markdown==3.6 - markdown-it-py==3.0.0 - markupsafe==2.0.0 - mdit-py-plugins==0.4.1 - mdurl==0.1.2 - mistune==0.8.4 - more-itertools==10.2.0 - mpmath==1.3.0 - multipledispatch==1.0.0 - nbclassic==1.0.0 - nbclient==0.5.13 - nbconvert==6.4.5 - nbformat==5.10.4 - networkx==3.2.1 - notebook==6.5.7 - notebook-shim==0.2.4 - openpyxl==3.1.2 - optlang==1.8.1 - optuna==3.6.1 - overrides==7.7.0 - pandocfilters==1.5.1 - panel==1.4.3 - param==2.1.0 - partd==1.4.2 - patsy==0.5.6 - plotly==5.22.0 - pluggy==0.13.1 - prometheus-client==0.20.0 - py==1.11.0 - pycparser==2.22 - pyct==0.5.0 - pydantic==2.7.1 - pydantic-core==2.18.2 - pyrsistent==0.20.0 - pytensor==2.19.0 - pytest==4.6.11 - python-dateutil==2.9.0.post0 - python-graphviz==0.20.3 - python-json-logger==2.0.7 - python-libsbml==5.20.2 - python-louvain==0.16 - pyvis==0.3.2 - pyviz-comms==3.0.2 - requests==2.32.3 - rfc3339-validator==0.1.4 - rfc3986-validator==0.1.1 - rich==13.7.1 - ruamel-yaml==0.18.6 - ruamel-yaml-clib==0.2.8 - sammi==0.1.7 - scikit-image==0.22.0 - scikit-learn==1.4.2 - scipy==1.13.0 - seaborn==0.13.2 - send2trash==1.8.3 - sniffio==1.3.1 - soupsieve==2.5 - sqlalchemy==2.0.30 - statsmodels==0.14.2 - swiglpk==5.0.10 - sympy==1.12 - tenacity==8.3.0 - terminado==0.18.1 - testpath==0.6.0 - tifffile==2024.5.22 - uc-micro-py==1.0.3 - umap==0.1.1 - umap-learn==0.5.6 - urllib3==2.2.1 - webencodings==0.5.1 - websocket-client==1.8.0 - widgetsnbextension==3.6.6 - xgboost==2.0.3 - xyzservices==2024.4.0 - zipp==3.18.1 -
depinfo --markdown cobra ### Package Information | Package | Version | |:--------|--------:| | cobra | 0.29.0 | ### Dependency Information | Package | Version | |:--------------------|------------:| | appdirs | 1.4.4 | | black | **missing** | | bumpversion | **missing** | | depinfo | 2.2.0 | | diskcache | 5.6.3 | | future | 1.0.0 | | httpx | 0.27.0 | | importlib-resources | 6.4.0 | | isort | **missing** | | numpy | 1.26.4 | | optlang | 1.8.1 | | pandas | 2.2.2 | | pydantic | 2.7.1 | | python-libsbml | 5.20.2 | | rich | 13.7.1 | | ruamel.yaml | 0.18.6 | | scipy | 1.13.1 | | swiglpk | 5.0.10 | | tox | **missing** | ### Build Tools Information | Package | Version | |:-----------|--------:| | pip | 24.0 | | setuptools | 69.5.1 | | wheel | 0.43.0 | ### Platform Information | | | |:--------|--------------:| | Darwin | 23.5.0-x86_64 | | CPython | 3.9.19 |

Anything else?

No response

cdiener commented 5 months ago

Hi, this is because the model contains in valid GPRs. It's really old so they are contained in the notes which are usually not parsed by other tools at all, but cobrapy recognizes the legacy format and at least tries to read them. This is why other validators report no errors. But things like 2*Gene are not allowed in SBML GPRs so it will fail parsing them here.

Those GPRs would also have led to errors with cobraoy 0.22 just not on reading them but later when you use the GPRs in gene deletions for instance where you would see a similar error message.

There is not much we can do here because the model itself contains those broken GPRs. You could remove the notes fields by hand or simply rename the GENE_ASSOCIATION entry to something else so it does not get picked up by the legacy parser.

tamascogustavo commented 5 months ago

Hi @cdiener !

Thanks so much for the explanation, and for the feedback on how to deal with de GPRs.

Since it's not a BUG, I am closing it.

Best,

Tamasco