Genome scale metabolic models in SBML format
Example of the metabolic map from the Ralstonia eutropha genome scale model.
The model was previously published in: Park, J. M., Kim, T. Y., & Lee, S. Y. BMC Systems Biology, 5(1), 101. 2011, Genome-scale reconstruction and in silico analysis of the Ralstonia eutropha H16 for polyhydroxyalkanoate synthesis, lithoautotrophic growth, and 2-methyl citric acid production.
The original model (PDF) was parsed and converted to SBML standard by Peyraud R., Cottret L., Marmiesse L., Gouzy J., Genin S. PLoS Pathogens, 12(10), 2016. A Resource Allocation Trade-Off between Virulence and Proliferation Drives Metabolic Versatility in the Plant Pathogen Ralstonia solanacearum.
The Ralstonia_eutropha
folder contains the following folders:
data/
-- folder containing the reference 'model' files with Bigg annotation (*.json
)escher/
-- folder with map layout and model for escher visualization (*.json
)memote
-- folder containing the latest memote reports as *.html
documentssbml/
-- folder containing the original and improved SBML model (*.xml
)simulations/
-- folder with simulation results tables (*.csv
)The Ralstonia_eutropha
folder contains the following scripts:
upgrade_model.py
-- functions to import original model and convert to most recent SBML standard. Add reactions and modify erroneous reactions. Add Bigg and uniprot annotation. Test FBA with COBRApyupgrade_main.py
-- main script for execution of model upgradeget_model_reactions.py
-- function to retrieve and export reaction dataflux_analysis.py
-- wrapper functions for FBA, FVA and flux sampling with COBRApyessentiality analysis.py
-- find essential gene set predicted by the modeltest_simulations.py
-- script to run many simulations at once using COBRApyplot_simulations.R
-- R script to plot results from FBA, FVA, or other simulationsThe following changes correct errors, remove unnecessary reactions, or add new reactions. The original model, for example, showed flux through artificial energy generating cycles (Fritzemeier et al., PLOS Comp Bio, 2017). After identification and removal of the following issues, no activity of such cycles was found anymore using FBA.
P5CD4 --> PROD4
/P5CD5 --> PTO4H --> P5CD4
. The cycle generates 2 NADH per turn. The NADH/NAD cofactor stoichiometry for reaction P5CD5
/PROD4
was reversed and was corrected.PROD4
(also same gene). The reaction was removed.NADHDH
, an NADH dehydrogenase that converts UQ spontaneously to UQH2, a reaction that requires NADH canonically (see BiGG reaction: NADHDH). Cofactors were added.SUCOAS
is importantly not set to 'reversible' as it's supposed to be, and therefore constrained to wrong direction regarding canonical flow of TCA. Constraints were changed to allow reversible flux.P5CD2
, LEUD3
, LEUD4
, 4AMBUAT
, ACOADH2
were removed for teh following reasons. Reaction P5CD2
seems to have wrong reactants. There is no info on a direct conversion from L-glu to L-glu-5-semialdehyde. All Bigg pathways to L-glu-5-semialdehyde go via GLU5K
and G5SD
, the canonical pathway. Reactions LEUD3/4
seem to be wrong as well. They are annotated as L-leucine dehydrogenase, but convert 3-Methyl-2-oxobutanoate or 3-Methyl-2-oxopentanoate to L-isoleucin or L-Valin. There are however canonical AA synthesis reactions for this purpose, while these reactions participate in artificial cycles. Reaction 4AMBUAT
is annotated as 4-aminobutyrate aminotransferase, but catalyzes transfer of an aminogroup from malonyl semialdehyde to beta-alanine. It forms an artificial cycle with APAtr
and does not seem to exist in nature. Reaction ACOADH2
oxidizes praopanoyl-CoA to propenoyl-CoA and is the reverse reaction of PCNO
(forms artificial cycle).MICITL
is the last step of the methyl-citrate cycle, an alternative route through TCA from oaa + propcoa --> succ + pyr
. It carries artificially high flux, so flux of the final reaction was constrained.PYK
is allowed in the model to go in reverse direction (pyr + atp --> pep + adp
) but this is highly unlikely under physiological conditions (see e.g. wikipedia, or BiGG database). Standard E. coli models also exclude the reverse reaction.PYK
(PYK1
, PYK2
, PYK3
) that carry most likely very little or no flux in R. eutropha, were silenced.PYC
should only run in direction from pyr --> oaa
, but not reverse (see E. coli reference models in BiGG).PPC
has correct bounds but one H+ reactant too much. The reaction was corrected.pydx5p
). Correcting this error prevents an infeasible cycle of reactions where the same enzyme (H16_A2802
) shows different reaction directionalities for the same metabolites (only difference being phosphorylated or not).PYR5OXX
, PYR5OXM
) were set to irreversible, as they were taking part in an artificial O2 generating cycle, and are thermodynamically unlikely to go reverse.asp_c
and aspsa_c
, are labelled with the name of aspartate in the model and take part in different reactions. However aspsa_c
is in reality L-Aspartate 4-semialdehyde (source: BiGG). This was renamed in the model. The reactions are correct.CABPS
, carbamoylphosphate synthase) require hco3
instead of co2
as substrate for phosphorylation. However, a co2 <=> hco3
equilibration reaction is missing (see BiGG reaction HCO3E
). The reaction was added.fru --> f6p
. It is not clear if the alternative PEP-PTS dependent fructose uptake and phosphorylation exists in R. eutropha. Therefore the PEP-PTS reaction
was silenced (more details, see Kaddor & Steinbuechel, 2011).PRUK
(cbbP2, H16_B1389; cbbPp, PHG421) catalyzing phosphorylation of Ribulose-5-phosphate: atp_c + rl5p_c --> adp_c + h_c + rb15bp_c
. And 2) Ribulose-1,5-bisphosphate carboxylase RBPC
(cbbS2, H16_B1394; cbbL2, H16_B1395; cbbSp, PHG426, cbbLp, PHG427) catalyzing the addition of CO2: co2_c + h2o_c + rb15bp_c --> 2.0 h_c + 2.0 3pg_c
. The metabolite Ribulose-1,5-bisphosphate was added and the original reaction CBBCYC
silenced.MISRXN
, named in the model as 'unclear reaction'. This reaction was removed and replaced by a correct thiazole phosphate synthesis reaction (see Bigg reference reaction THZPSN
)HYD1
in model) was improved by adding the hoxKGZ
subunits, see Cramm, 2008. The second and missing reaction, soluble hydrogenase (SH) was added as HYDS
including gene annotation with hoxFUYH
. An H2 transporter H2td
(diffusion) and exchange reaction EX_h2_e
was added (required new metabolite external H2, h2_e
).Other changes regarding to annotation:
cobrapy-bigg-client
cobrapy-bigg-client
cobrapy-bigg-client
cobrapy-bigg-client
bioservices.uniprot
and bioservices.kegg
ugpQ
, H16_A2326gdpD
, unknown
, H16_A024
, H16_A2911plsC1
, plsC2
, spontaneous
) The Memote web service was used to test the capabilities of the model and identify problems. A comparison was made between the original published model (Memote_RehMBEL1391_sbml_L2V1
), and the upgraded version of the model with the changes listed above (Memote_RehMBEL1391_sbml_L3V1
). The memote score improved from 28% to 76% owing to the addition of metadata, and correcting many errors. The reports for the original and improved model can be found here and here.
The genome scale metabolic models in this repository are encoded according to SBML standard and saved as human-readable *.xml
or *.json
files.
The models can have well over 1000 reactions, it is therefore recommended to work with these models using a frame work such as COBRApy. To install COBRApy follow the instructions on its github page. Installation of additional python 3 dependencies might be necessary for full functionality, such as libsbml
, numpy
,
scipy
, or pandas
.
# for linux, run the following line in terminal
sudo apt install python3-pip
pip install cobra
To work with a model using COBRApy, we can import it in a python session. We can look at the number of reactions, metabolites and genes associated with the model's reactions.
# load libraries
import numpy as np
import pandas as pd
import tabulate
import cobra
import os
from os.path import join
# set the path to the model's directory
data_dir = '~/genome-scale-models/Ralstonia_eutropha/'
model = cobra.io.read_sbml_model(join(data_dir, "sbml/RehMBEL1391_sbml_L3V1.xml"))
# summary of the imported model
print('%i reactions' % len(model.reactions))
print('%i metabolites' % len(model.metabolites))
print('%i genes' % len(model.genes))
The *.xml
files containing the model definition have three (four) major slots:
nadh_c
for NADH or fru_c
for fructose. PYK
: adp_c + pep_c --> atp_c + pyr_c
)Genome scale models are constrained primarily by two things:
-Inf
to +Inf
for reversible reactions. Irreversible reactions would be bounded by 0 < f < +Inf
, so that the reaction is only allowed to proceed in one direction.In COBRApy, we can look at the bounds of e.g. all exchange reactions (the ones supplying metabolites from the environment), and set them to a different value.
# inspect exchange reactions
model.exchanges.list_attr("bounds")
# we can also change bounds for reactions
for reaction in model.exchanges:
reaction.lower_bound = 0.0
reaction.upper_bound = 1000.0
In contrast to other types of models such as simple resource allocation models, genome scale models usually don't include cellular processes for production of macromolecules. In other words, transcription, translation, and DNA replication are not explicitly included in the model but only appear as abstract, lumped reactions.
The objective function of the model is a variable similar to other variables that are optimized by the solver. However, when solving the model the prime target of the algorithm is to maximize this variable, often the Biomass
reaction. Units of all reactions are by default in mmol per gDCW per h, but since the biomass reaction is per definition formulated such that 1 mmol biomass equals 1 g biomass, it also represents the specific growth rate μ (g biomass per gDCW per hour, biomass term can be eliminated).
In COBRApy we can tell the solver to optimize (usually: maximize) the flux through any reaction of choice. To maximize growth rate that would be the Biomass equation.
# set objective function
model.objective = {model.reactions.Biomass: 1}
Before we can pass the model to the solver and find the optimal flux distribution towards our goal, we have to define a growth medium (a set of exchange fluxes that represent the nutrients available in the outer environment of the cell).
model.medium = {
'EX_mg2_e': 10.0,
'EX_pi_e': 100.0,
'EX_cobalt2_e': 10.0,
'EX_cl_e': 10.0,
'EX_k_e': 10.0,
'EX_fe3_e': 10.0,
'EX_so4_e': 10.0,
'EX_fru_e': 5.0,
'EX_nh4_e': 10.0,
'EX_na_e': 10.0,
'EX_o2_e': 18.5,
'EX_mobd_e': 10.0,
'EX_h2o_e': 1000.0,
'EX_h_e': 100.0
}
The solver will then analyze the network and find the optimal steady state flux from our input metabolites to biomass.
# run FBA analysis
solution = model.optimize()
# print solution summary, the status from the linear programming solver
print([solution, "status: ", solution.status])
# print top 10 forward and backward flux
fluxes = solution.fluxes.sort_values()
print(fluxes[0:10])
print(fluxes[len(fluxes)-10:len(fluxes)])
# quick summary of FBA analysis
print(model.summary())
# summary of energy balance
print(model.metabolites.atp_c.summary())
# summary of redox balance
print(model.metabolites.nadh_c.summary())
To visualize simulation results such as those from FBA, it is extremely informative to overlay reaction (flux) data on top of a familiar metabolic map. The open source tool Escher can be used for this purpose. It can be used in a python session but just as well as an online tool. Three files are required that can be obtained from a standard SBML model.
.json
file. Simply use the COBRAPy command
cobra.io.save_json_model(my_model, 'my_path/my_model.json')
.csv
table with the first column being reaction IDs corresponding to the map, and the second (optionally third) column being flux values.