opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
467 stars 218 forks source link

problems reading certain AGORA2 models #1326

Open eugenibc opened 1 year ago

eugenibc commented 1 year ago

Hi, I'm trying to build a micom database with AGORA2 models (micom.workflows.build_database) and i'm facing issues with 1746 of the 7302 models. This is one of the reconstructions that failed to be read: https://www.vmh.life/files/reconstructions/AGORA2/version2.01/sbml_files/individual_reconstructions/Acinetobacter_pittii_ANC_4050.xml

And this is the error I get when i run the cobra.io.sbml.validate_sbml_model on the file:

cobra.io.sbml.validate_sbml_model("/home/ebelda/AGORA2/sbmlfiles/Acinetobacter_pittii_ANC_4050.xml")
(None, {'SBML_FATAL': [], 'SBML_ERROR': ['E0 (Error): XML content (core, L100894); Badly formed XML; XML content is not well-formed.\n', 'E1 (Error): General SBML conformance (core, L3); No model definition found; An SBML document must contain a <model> element. The <model> element is optional in L3V2 and beyond.\nReference: L3V1 Section 4.1\n'], 'SBML_SCHEMA_ERROR': [], 'SBML_WARNING': [], 'COBRA_FATAL': [], 'COBRA_ERROR': ['No SBML model detected in file.'], 'COBRA_WARNING': [], 'COBRA_CHECK': []})

I would like to know if there is something that could be done to solve this issue or if anyone has a suggestion about how to proceed Thanks in advance Eugeni Belda

cdiener commented 1 year ago

Hi, actually I have a lot of info on that since I had to prep them for the MICOM database release. Unfortunately it looks like something went wrong when the authors prepped the SBML models. A large fraction of them have some slight issues. In particular I found the following cases:

  1. In some cases the models are actually Matlab models with just the file ending having been renamed to .xml.
  2. A even larger fraction of models is declared as UTF in the XML but encodes in ISO Format which will raise an error in Most XML parsers.
  3. In some cases the model names of the SBML have spelling mistakes so they can't be matched to the manifest.

Looks like you hit case 1. For now there is not much one can do. For MICOM we converted all the .mat models to SBML with cobrapy to get around that.

However, the Matlab models work just fine and you can read those with cobra.io.load_mat_model.

eugenibc commented 1 year ago

Thanks a lot for your reply and explanations, and congratulations by the way for the great job you do around MICOM!

Waschina commented 1 year ago

Hi! I can confirm the issue with several AGORA2 models (version 2.01).

In addition to the three points mentioned by @cdiener, some SBML parsers may fail to read specific models with brackets "("/ or ")" in the chemical formula of metabolites. E.g. in the model sbml Bacteroides_ovatus_CL03T12C18.xml.

Encountered '(' when expecting a capital letter. The chemicalFormula 'C10H14N5O7P(C5H8O5PR)n(C5H8O5PR)n' has incorrect syntax.