opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
461 stars 216 forks source link

fix: inconsistency in the gene name storage in SBML files #950

Open BenjaSanchez opened 4 years ago

BenjaSanchez commented 4 years ago

Problem description

As noted in https://github.com/SysBioChalmers/yeast-GEM/pull/216, currently there's an inconsistency with how cobratoolbox and cobrapy store gene names in the xml file: cobratoolbox uses the fbc:label field, whereas cobrapy uses fbc:name. The latter is used by cobratoolbox to store protein information.

Code Sample

cobrapy:

model.reactions[0].gene_reaction_rule = "b3845"
model.genes.get_by_id("b3845").name = "fadA"
cobra.io.write_sbml_model(model,"test.xml")

cobratoolbox:

model = addGenes(model, {'b3845'}, 'proteins', {'P21151'}, 'geneNames', {'fadA'});
writeCbModel(model,"test.xml")

Actual Output

cobrapy:

<fbc:geneProduct fbc:id="G_b3845" fbc:name="fadA" fbc:label="G_b3845"/>

cobratoolbox:

<fbc:geneProduct metaid="G_b3845" fbc:id="G_b3845" fbc:name="P21151" fbc:label="fadA"/>

Expected Output

Ideally both packages should save the model in the same way, to allow opening an xml file with either and without loosing any info. I'm not sure if it makes more sense to change the cobratoolbox or cobrapy convention: name sounds more reasonable for a gene name, but as @Midnighter pointed out, label is a required field by SBML whereas name is not.

Dependency Information

System Information ================== OS Windows OS-release 10 Python 3.7.7 Package Versions ================ cobra 0.17.1 depinfo 1.5.3 future 0.18.2 numpy 1.18.3 optlang 1.4.4 pandas 1.0.3 pip 20.0.2 python-libsbml-experimental 5.18.0 ruamel.yaml 0.16.10 setuptools 46.1.3.post20200330 six 1.14.0 swiglpk 4.65.1 wheel 0.34.2
cdiener commented 4 years ago

Pinging @matthiaskoenig and @draeger for guidance. Thanks!