opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
465 stars 218 forks source link

import sbml file, AttributeError #694

Closed XiangZhangSC closed 6 years ago

XiangZhangSC commented 6 years ago

Problem description

Please explain: I tried to import a sbml model downloaded from https://bmcsystbiol.biomedcentral.com/articles/10.1186/1752-0509-5-180

Code Sample

model = cobra.io.read_legacy_sbml("AdipocyteModel.xml")

Actual Output

~/venv/lib/python3.6/site-packages/cobra/io/sbml.py in create_cobra_model_from_sbml_file(sbml_filename, old_sbml, legacy_metabolite, print_time, use_hyphens) 104 raise Exception("Conversion of SBML+fbc to COBRA failed") 105 sbml_model = model_doc.getModel() --> 106 sbml_model_id = sbml_model.getId() 107 sbml_species = sbml_model.getListOfSpecies() 108 sbml_reactions = sbml_model.getListOfReactions()

AttributeError: 'NoneType' object has no attribute 'getId'

Expected Output

Output of cobra.show_versions()

System Information ================== OS Darwin OS-release 17.5.0 Python 3.6.4 Package Versions ================ pip 9.0.1 setuptools 38.5.1 cobra 0.11.3 future 0.16.0 swiglpk 1.4.4 optlang 1.3.0 ruamel.yaml 0.14.12 pandas 0.22.0 numpy 1.14.2 tabulate 0.8.2 python-libsbml 5.16.0
Midnighter commented 6 years ago

Could you try again with the function cobra.io.read_sbml_model, please?

XiangZhangSC commented 6 years ago

Hi Midnight, Sorry I didn't mention that I also ran what you suggested. Error message is below.

model = cobra.io.read_sbml_model("AdipocyteModel.xml")

ParseError Traceback (most recent call last) ~/venv/lib/python3.6/site-packages/cobra/io/sbml3.py in parse_stream(filename) 151 else: --> 152 return parse(filename) 153 except ParseError as e:

~/venv/lib/python3.6/xml/etree/ElementTree.py in parse(source, parser) 1195 tree = ElementTree() -> 1196 tree.parse(source, parser) 1197 return tree

~/venv/lib/python3.6/xml/etree/ElementTree.py in parse(self, source, parser) 596 # it with chunks. --> 597 self._root = parser._parse_whole(source) 598 return self._root

ParseError: not well-formed (invalid token): line 9947, column 171

During handling of the above exception, another exception occurred:

CobraSBMLError Traceback (most recent call last)

in () ----> 1 model = cobra.io.read_sbml_model("AdipocyteModel.xml") ~/venv/lib/python3.6/site-packages/cobra/io/sbml3.py in read_sbml_model(filename, number, **kwargs) 566 if not _with_lxml: 567 warn("Install lxml for faster SBML I/O", ImportWarning) --> 568 xmlfile = parse_stream(filename) 569 xml = xmlfile.getroot() 570 # use libsbml if not l3v1 with fbc v2 ~/venv/lib/python3.6/site-packages/cobra/io/sbml3.py in parse_stream(filename) 152 return parse(filename) 153 except ParseError as e: --> 154 raise CobraSBMLError("Malformed XML file: " + str(e)) 155 156 CobraSBMLError: Malformed XML file: not well-formed (invalid token): line 9947, column 171
cdiener commented 6 years ago

Looks like it is not valid SBML. Could you pass your model through http://sbml.org/validator/ ? If that also tells you that the model is invalid there is very little we can do from our side :(

XiangZhangSC commented 6 years ago

Indeed, This document is not valid SBML! From http://sbml.org/validator. But I don't get much information from the error message either.

1 Error

Error Line 9947 Column 141: (XML Error #1006) XML content is not well-formed.

GENE_ASSOCIATION: (NM_005956.2 or NM_015440.3)

GENE_LIST: NM_005956.2 NM_015440.3

SUBSYSTEM: Vitamins & Cofactor Biosynthesis

cdiener commented 6 years ago

It does tell you the line in the document. Is there something odd maybe?

XiangZhangSC commented 6 years ago

Hi cdiener, Finally, I find out it is the character "&". Somehow the sbml parser cannot deal with it and returned the error message. When I changed it into "and", the command cobra.io.read_sbml_model can work. However, I got a very long warning about the GPR. In this sbml file, when a reaction can be catalysed by multiple protein complexes. The model creator build each complex with AND and then combine individual complex with OR, resulting using "(" and ")" in the GPR part. I am wondering if you have any suggestion on that.

cobra/core/reaction.py:394 UserWarning: malformed gene_reaction_rule '(((NM_022745.2 or NM_001042546) and NM_145691.3 and MT4509 and MT4508 and NM_001688.4 and (NM_005176.5 or NM_001002031.2) and (NM_001003785.1 or NM_006356.2) and NM_007100.2 and (NM_004889.2 or NM_001039178.1 or NM_001003714.1 or NM_001003713.1) and (NM_001003703.1 or NM_001003696.1 or NM_001003697.1 or NM_001003701.1 or NM_001685.4) and NM_006476.4 and (NM_015684.2 or NM_001003805.1 or NM_001003803.1) and NM_001697.2 and (NM_004046.4 or NM_001001937.1) and NM_001686.3 and (NM_001001975.1 or NM_001687.4) and (NM_001001977.1 or NM_006886.2) and (NM_005174.2 or NM_001001973.1)) or ((NM_022745.2 or NM_001042546) and NM_145691.3 and MT4509 and MT4508 and NM_001688.4 and (NM_001002027.1 or NM_005175.2) and (NM_001003785.1 or NM_006356.2) and NM_007100.2 and (NM_004889.2 or NM_001039178.1 or NM_001003714.1 or NM_001003713.1) and (NM_001003703.1 or NM_001003696....' for <Reaction ATPS4m at 0x10d5804a8>

ChristianLieven commented 6 years ago

@XiangZhangSC In XML the ampersand character & has a special meaning and will, therefore, lead to issues when it is not used with this meaning in mind. SBML is built using XML rules, hence the ampersand is subject to the same restrictions.

Take a look at http://xml.silmaril.ie/specials.html for more background information (i.e. other special characters).


The warning you're getting from cobrapy is because the gene list in the SBML file is actually deprecated as indicated by the sequence of periods .....

The entry in the SBML file reads:

<reaction id="R_ATPS4m" name="ATP synthase, mitochondrial" reversible="false">
        <notes>
          <html xmlns="http://www.w3.org/1999/xhtml"><p>GENE_ASSOCIATION: (((NM_022745.2 or NM_001042546) and NM_145691.3 and MT4509 and MT4508 and NM_001688.4 and (NM_005176.5 or NM_001002031.2) and (NM_001003785.1 or NM_006356.2) and NM_007100.2 and (NM_004889.2 or NM_001039178.1 or NM_001003714.1 or NM_001003713.1) and (NM_001003703.1 or NM_001003696.1 or NM_001003697.1 or NM_001003701.1 or NM_001685.4) and NM_006476.4 and (NM_015684.2 or NM_001003805.1 or NM_001003803.1) and NM_001697.2 and (NM_004046.4 or NM_001001937.1) and NM_001686.3 and (NM_001001975.1 or NM_001687.4) and (NM_001001977.1 or NM_006886.2) and (NM_005174.2 or NM_001001973.1)) or ((NM_022745.2 or NM_001042546) and NM_145691.3 and MT4509 and MT4508 and NM_001688.4 and (NM_001002027.1 or NM_005175.2) and (NM_001003785.1 or NM_006356.2) and NM_007100.2 and (NM_004889.2 or NM_001039178.1 or NM_001003714.1 or NM_001003713.1) and (NM_001003703.1 or NM_001003696....</p><p>GENE_LIST: MT4508 MT4509 NM_001001937.1 NM_001001973.1 NM_001001975.1 NM_001001977.1 NM_001002027.1 NM_001002031.2 NM_001003696.... NM_001003696.1 NM_001003697.1 NM_001003701.1 NM_001003703.1 NM_001003713.1 NM_001003714.1 NM_001003785.1 NM_001003803.1 NM_001003805.1 NM_001039178.1 NM_001042546 NM_001685.4 NM_001686.3 NM_001687.4 NM_001688.4 NM_001697.2 NM_004046.4 NM_004889.2 NM_005174.2 NM_005175.2 NM_005176.5 NM_006356.2 NM_006476.4 NM_006886.2 NM_007100.2 NM_015684.2 NM_022745.2 NM_145691.3</p><p>SUBSYSTEM: Oxidative Phosphorylation</p></html>
        </notes>
        <listOfReactants>
          <speciesReference species="M_h_c" stoichiometry="4"/>
          <speciesReference species="M_adp_m"/>
          <speciesReference species="M_pi_m"/>
        </listOfReactants>
        <listOfProducts>
          <speciesReference species="M_atp_m"/>
          <speciesReference species="M_h_m" stoichiometry="3"/>
          <speciesReference species="M_h2o_m"/>
        </listOfProducts>
        <kineticLaw>
          <math xmlns="http://www.w3.org/1998/Math/MathML">
            <ci> FLUX_VALUE </ci>
          </math>
          <listOfParameters>
            <parameter id="LOWER_BOUND" value="0" units="mmol_per_gDW_per_hr"/>
            <parameter id="UPPER_BOUND" value="1000" units="mmol_per_gDW_per_hr"/>
            <parameter id="OBJECTIVE_COEFFICIENT" value="0"/>
            <parameter id="FLUX_VALUE" value="500" units="mmol_per_gDW_per_hr"/>
          </listOfParameters>
        </kineticLaw>
      </reaction>

The warning can actually be ignored if you don't plan to carry out in silico knockout studies. The cobrapy parser just expects parenthesis to be properly closed and struggles to find some closing brackets here. You might want to get in touch with the authors of the model to get the full model export, as this one is clearly missing information.

XiangZhangSC commented 6 years ago

Thanks a lot. I really appreciate your explanation.