opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
465 stars 218 forks source link

Loss of relation elements in annotation when writing a SBML model again. #937

Open Hemant27031999 opened 4 years ago

Hemant27031999 commented 4 years ago

Problem Description

If we read a SBML model, from the corresponding XML file and write it back, then, if the annotation field of some component had more than one relation elements like "is" and "hasTaxon", they are lost and only one relation element exists in the newly written model, with all resources listed under it.

Code Sample

I took the e_coli_core.xml model present inside the data directory, read it using

model = cobra.io.read_sbml_model("e_coli_core.xml")

and when I wrote it back using,

cobra.io.write_sbml_model(model, "e_coli_model.xml")

the annotation format changed as follows :

Annotation in the actual model :

...
<annotation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqmodel="http://biomodels.net/model-qualifiers/" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
        <rdf:Description rdf:about="#e_coli_core">
          <bqbiol:hasTaxon>
            <rdf:Bag>
              <rdf:li rdf:resource="http://identifiers.org/taxonomy/511145" />
            </rdf:Bag>
          </bqbiol:hasTaxon>
          <bqmodel:is>
            <rdf:Bag>
              <rdf:li rdf:resource="http://identifiers.org/bigg.model/e_coli_core" />
            </rdf:Bag>
          </bqmodel:is>
          <bqmodel:isDescribedBy>
            <rdf:Bag>
              <rdf:li rdf:resource="http://identifiers.org/doi/10.1128/ecosalplus.10.2.1" />
            </rdf:Bag>
          </bqmodel:isDescribedBy>
        </rdf:Description>
      </rdf:RDF>
    </annotation>
...

Annotation after writing the model :

...
<annotation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
        <rdf:Description rdf:about="#meta_e_coli_core">
          <bqbiol:is>
            <rdf:Bag>
              <rdf:li rdf:resource="https://identifiers.org/taxonomy/511145"/>
              <rdf:li rdf:resource="https://identifiers.org/bigg.model/e_coli_core"/>
              <rdf:li rdf:resource="https://identifiers.org/doi/10.1128/ecosalplus.10.2.1"/>
            </rdf:Bag>
          </bqbiol:is>
        </rdf:Description>
      </rdf:RDF>
    </annotation>
...

Expected output :

The resources listed under different relational elements should not have changed. They should be like in the actual format.

Output of python -c "from cobra import show_versions; show_versions()":

System Information ================== OS Linux OS-release 4.15.0-88-generic Python 3.7.3 Package Versions ================ cobra 0.17.1 depinfo 1.5.3 future 0.18.2 numpy 1.18.1 optlang 1.4.4 pandas 1.0.0 pip 20.0.2 python-libsbml-experimental 5.18.0 ruamel.yaml 0.16.7 setuptools 45.1.0 six 1.14.0 swiglpk 4.65.1 wheel 0.34.2
matthiaskoenig commented 4 years ago

Yes, this is a known problem with the annotations. The main problem is that the storage of the annotations is much to restrictive in cobrapy at the moment (this comes mainly from the JSON model exchange format which only allows simple list of annotation terms). The issue is discussed in https://github.com/opencobra/cobrapy/issues/684. We basically need a much richer internal annotation format in cobrapy which supports all the relationships.