Chalmers GEM Yeast Model -- No Solutions

tylerautera commented 1 year ago

Hi Strain Design Team,

I am working with the Chalmers Yeast Model, and I have been unable to create a scenario where there are solutions found despite several attempts on heterologous pathways/targets and native targets. I am wondering if I am not setting up the model correctly, or if there is something else going on.

Here is a link to the model: Chalmer's Yeast GEM

As a simple example, I was attempting to get SUCP solutions for ethanol, here are the modules I used:

module_suppress = sd.SDModule(model,sd.names.SUPPRESS,constraints='r_1761 + 0.1 r_1714 <= 0')
module_protect  = sd.SDModule(model,sd.names.PROTECT, constraints='r_2111>=0.1')

As for the other parameters for compute_strain_design:

time_limit = 300
max_cost = 10
max_solutions = 5
solutions_approach = 'any'

It would be great to get some guidance on what might be driving the issue here.

VonAlphaBisZulu commented 1 year ago

Hi Tyler,

The Yeast GEM is one of the largest models out there, with some very complex gene rules. Honestly, strain design will be challenging, but in my opinion possible for ethanol or other small alcohols.

For guidance, check out the genome scale strain design chapter.

Here my advice:

You will need Gurobi or CPLEX for this task. SCIP and GPLK cannot handle problems of that size and complexity
YeastGEM has plenty of exchange reactions that can be, but in practice aren't used (such as taurocholate export). You want to close these, if possible, and only keep potentially real fermentation products open, like lactate, acetate etc. If you wish to be extra careful, you can open up the exchanges for pyruvate and other primary metabolites again, once the first computation runs smoothly.
Verify your strain design goals. Generate plots of your suppressed and protected region. Until you found a first, arbitrary solution, you want to relax the search criteria as much as possible. In your case, you demanded a possible growth rate of 0.1, but the maximal growth rate in the default yeastGEM is only 0.082... so you'll have to change that.
For your first computation, stick with reaction-based intervention strategies. If you eventually proceed with gene-based strain designs and face problems, reduce the number and complexity of the GPR (gene-protein-reaction) rules of your model. You'll find examples in the manual (same chapter as before). Again, pick the most relaxed settings for your first run. This means high max costs, only a single solution to return and plenty of time, e.g. 24h. You can tighten the constraints and parameters, once your first computation works.

Here is a script that I scribbled down and that for me. It took about 25 minutes to compute on my computer:

import straindesign as sd
import cobra
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)

cobra.Configuration().solver = 'cplex'
# 1) Load model and, if available, set the solver to cplex or gurobi. This will increase the chances
# of sucess enormously
yeastGEM = cobra.io.read_sbml_model('C:/Users/phili/Downloads/yeast-GEM.xml')
yeastGEM.solver = 'cplex'

# r_1761: ethanol
# r_1714: glucose
# r_2111: growth

## 2) Block unnecessary exchange reactions
exchange_reacs = [r for r in yeastGEM.reactions if all(s<0 for s in r.metabolites.values())]
# protect main fermentation products
fermentation_products = {
"(S)-lactate exchange",
"(R)-lactate exchange",
"carbon dioxide exchange",
"acetate exchange",
"ethanol exchange",
"formate exchange",
"H+ exchange",
"water exchange",
"methanol exchange",
"oxygen exchange",
"succinate exchange",
"diphosphate exchange",
"growth",
"glycolaldehyde exchange"}
# shut all exchange fluxes
for r in exchange_reacs:
    if r.name not in fermentation_products:
        r.upper_bound = 0.0

# 3) Verify that the pathways are operational
sol = sd.fba(yeastGEM,obj='r_1761',obj_sense='max')
print(f"Maximum possible ethanol synthesis rate: {sol.fluxes['r_1761']}.")
print(f"Glucose uptake rate: {sol.fluxes['r_1714']}.")
sol = sd.fba(yeastGEM,obj='r_2111',obj_sense='max')
print(f"Maximum possible growth rate: {sol.fluxes['r_2111']}.")

min_etoh_yield = 0.25
min_growth = 0.01

# Plot suppressed and protected regions
module_suppress = sd.SDModule(yeastGEM,sd.names.SUPPRESS,constraints=f'r_1761 + {min_etoh_yield} r_1714 <= 0')
module_protect  = sd.SDModule(yeastGEM,sd.names.PROTECT, constraints=f'r_2111 >= {min_growth}')

# Wild-type plot
datapoints, triang, plot1 = sd.plot_flux_space(yeastGEM,
                                               ('r_2111',('r_1761','-r_1714')),
                                               show=False);
_,          _,      plot2 = sd.plot_flux_space(yeastGEM,
                                               (f'r_2111',('r_1761','-r_1714')),
                                               constraints=f'r_2111 >= {min_growth}',
                                               show=False);
plot2.set_facecolor('#70AD47')
plot2.set_edgecolor('#70AD47')
# pGCP design plot
_,          _,      plot3 = sd.plot_flux_space(yeastGEM,
                                               ('r_2111',('r_1761','-r_1714')),
                                               # The sign of the glucose exchange reaction is flipped since
                                               # reaction is defined in the direction of secretion.
                                               constraints=f'r_1761 + {min_etoh_yield} r_1714 <= 0',
                                               show=False);
plot3.set_facecolor('#ED7D31')
plot3.set_edgecolor('#ED7D31')
# adjust axes limits and show plot
plot3.axes.set_xlim(0, 1.05*max([a[0] for a in datapoints]))
plot3.axes.set_ylim(0, 1.05*max([a[1] for a in datapoints]))
plt.show()

# 4) Compute reaction-based strain designs with relaxed settings
# possible knockout of O2
ko_cost = {r.id : 1.0 for r in yeastGEM.reactions if r.genes}
ko_cost.update({'r_1992': 1.0})
# Compute strain designs
sols = sd.compute_strain_designs(yeastGEM,
                                 sd_modules = [module_suppress,module_protect],
                                 max_solutions = 1,
                                 max_cost = 25,
                                 ko_cost = ko_cost,
                                 time_limit = 86400,
                                 solution_approach = sd.names.ANY)

# 5) Print results
print("Knockouts")
print({yeastGEM.reactions.get_by_id(n).name for n in {k for k in sols.get_reaction_sd()[0].keys()}})

# Wild-type plot
datapoints, triang, plot1 = sd.plot_flux_space(yeastGEM,
                                               ('r_2111',('r_1761','-r_1714')),
                                               show=False);
_,          _,      plot2 = sd.plot_flux_space(yeastGEM,
                                               ('r_2111',('r_1761','-r_1714')),
                                               constraints=f'r_2111>={min_growth}',
                                               show=False);
plot2.set_facecolor('#70AD47')
plot2.set_edgecolor('#70AD47')
# pGCP design plot
_,          _,      plot3 = sd.plot_flux_space(yeastGEM,
                                               ('r_2111',('r_1761','-r_1714')),
                                               # The sign of the glucose exchange reaction is flipped since
                                               # reaction is defined in the direction of secretion.
                                               constraints=f'r_1761 + {min_etoh_yield} r_1714 <= 0',
                                               show=False);
plot3.set_facecolor('#ED7D31')
plot3.set_edgecolor('#ED7D31')
# plotting designed strain
interventions = [[{s:1.0},'=',0.0] for s,v in sols.reaction_sd[0].items() if v < 1]
_,          _,      plot4 = sd.plot_flux_space(yeastGEM,
                                               ('r_2111',('r_1761','-r_1714')),
                                               # The sign of the glucose exchange reaction is flipped since
                                               # reaction is defined in the direction of secretion.
                                               constraints=interventions,
                                               show=False);
plot4.set_facecolor('#FFC000')
plot4.set_edgecolor('#FFC000')
# adjust axes limits and show plot
plot4.axes.set_xlim(0, 1.05*max([a[0] for a in datapoints]))
plot4.axes.set_ylim(0, 1.05*max([a[1] for a in datapoints]))
plt.show()

Keep in mind that there can be particularities in the yeastGEM that make strain design with that model difficult, and that I am not sure were fixed. For instance, growth was impossible under anaerobic conditions. You might get strange results that circumvent knocking out oxygen uptake by introducing a number of surrogate knockouts, e.g. at the ATP synthase or similar. A model must contain all stoichiometric possibilities, in order for StrainDesign to work, and your computation will only be as good as your input model. If the model says growth without O2 is not possible, it won't return O2 as a knockout.

In my case, the script above found the knockout set: {'aldehyde dehydrogenase (acetaldehyde, NADP)', 'pyruvate dehydrogenase', 'ATP synthase', 'acetaldehyde dehydrogenase', 'aldehyde dehydrogenase (acetylaldehyde, NAD)'}. Not the most exciting one, I've seen in my life, but it makes sense, given the model limitations.

Axis are y: ethanol yield, x: growth rate

Good luck with your computations

tylerautera commented 1 year ago

@VonAlphaBisZulu thank you for this, I will give this a whirl.

tylerautera commented 1 year ago

@VonAlphaBisZulu I was able to replicate the example above, so thank you for your suggestions. I have tried to incorporate your approach with how we are trying to model it under anaerobic growth conditions – but when I make those additional model modifications, Strain Design does not yield any results – despite the flux space plots looking as expected – and FBA checks working.

I was hoping you might have some insights here. Essentially, I am modifying the model with your suggested changes around the exchanges, then, running our updates to the model which allow for anaerobic growth (opening back up some exchanges), also verifying that glucose uptake, and growth bounds are set appropriately.

Below are the modifications we make to the model to enable anaerobic growth, these changes are made after your recommended changes:

def update_model_for_anaerobic_growth(model:Model):

## ENABLING CONDITIONS FOR ANAEROBIC GROWTH
rxn = model.reactions.get_by_id('r_1992')
rxn.knock_out()
# Oxygen uptake (r_1992) was blocked. Additionally, all reactions consuming oxygen
# (s_1275 in the cytoplasm, s_1276 in the endoplasmic reticulum, s_1277 in the extracellular
# media, s_1278 in the mitochondrion, s_1279 in the peroxisome and s_2817 in the
# endoplasmic reticulum membrane) were blocked as well.

blockList = ('s_1275', 's_1276', 's_1277', 's_1278', 's_1279', 's_2817')
for met in blockList:
    M = model.metabolites.get_by_id(met)
    M.remove_from_model

# 2. The following exchange reactions were unblocked, in order to properly simulate anaerobic
# media (with fatty acids and sterols): ergosterol uptake (r_1757), lanosterol uptake (r_1915),
# zymosterol uptake (r_2106), 14-demethyllanosterol uptake (r_2134)
# ergosta5,7,22,24(28)-tetraen-3beta-ol uptake (r_2137), oleate uptake (r_2189) and palmitoleate
# uptake (r_1994). 

unblockList = ('r_1757', 'r_1915', 'r_2106', 'r_2134', 'r_2137', 'r_2189', 'r_1994', 'r_2005')

for rxn in unblockList:
    R = model.reactions.get_by_id(rxn)
    R.upper_bound = 0
    R.lower_bound = -1000
#     print(R.name, "\t upper|lower:", R.upper_bound, "|", R.lower_bound)

# 3. Heme a (s_3714) was removed from the biomass pseudo-reaction (r_4041), given that is
# needed for respiration, which is not active in anaerobic conditions, and 3 reactions involved
# in its synthesis require oxygen (r_0304, r_0942 and r_0530).

coFactorPseudo = model.reactions.get_by_id('r_4598')

# # remove heme from Pseudoreaction  
# for metabolite, stoich in biomassPseudo.metabolites.items():
#     if metabolite.id == 's_3714':
#         biomassPseudo.subtract_metabolites({metabolite.id: stoich})

# for metabolite , stoich in biomassPseudo.metabolites.items():
#     print(metabolite.name, "\t", metabolite.id, "\t:", stoich)

# remove heme  from Pseudoreaction  
for metabolite, stoich in coFactorPseudo.metabolites.items():
    if metabolite.id == 's_3714' or metabolite.id == 's_1198' or metabolite.id == 's_1203' or metabolite.id == 's_1207' or metabolite.id == 's_1212' or metabolite.id == 's_0529':
        coFactorPseudo.subtract_metabolites({metabolite.id: stoich})

for metabolite , stoich in coFactorPseudo.metabolites.items():
    print(metabolite.name, "\t", metabolite.id, "\t:", stoich)

# 4. Glycerol production should be present in anaerobic growth of yeast, acting as a NADH sink.
# For this to occur, the directionality of some reactions that were consuming NADH was
# fixed, including:
# o Malate dehydrogenase was fixed to not operate backwards. This was done both in
# the mitochondrion (r_0713_REV) and cytosol (r_0714_REV).

revBlock = ('r_0713', 'r_0714')

for rxn in revBlock:
    R = model.reactions.get_by_id(rxn)
    R.upper_bound = 1000
    R.lower_bound = 0

# o Glycerol dehydrogenase (r_0487) was fixed to not operate forward.
# o Glutamate synthase (r_0472) was also fixed to not operate forward.

forBlock = ('r_0487', 'r_0472')

for rxn in forBlock:
    R = model.reactions.get_by_id(rxn)
    R.upper_bound = 0
    R.lower_bound = -1000

# 5. Finally and as already stated in section 2.5, the growth associated ATP maintenance (GAM)
# was set to 16 mmol/gDW, and the non-growth associated ATP maintenance (NGAM) was
# set at 0 mmol/gDWh. 

# %1st change: Refit GAM and NGAM to exp. data, change biomass composition
# GAM   = 30.49;  %Data from Nissen et al. 1997
# P     = 0.461;  %Data from Nissen et al. 1997  ## STILL NEED TO FIGURE THIS SCALING FACTOR

NGAM = model.reactions.get_by_id('r_4046')
NGAM.lower_bound = 0
NGAM.upper_bound = 0

BiomassPseudo = model.reactions.get_by_id('r_4041')
rescaleP = ('s_0394','s_0794', 's_1322' )
rescaleN = ('s_0434','s_0803' )

for metabolite, stoich in BiomassPseudo.metabolites.items():
    for met in rescaleP:

        if metabolite.id == met:
            BiomassPseudo.subtract_metabolites({metabolite.id: 24.81})
#             BiomassPseudo.add_metabolites({metabolite.id: 30.49})

    for met in rescaleN:

        if metabolite.id == met:
            BiomassPseudo.subtract_metabolites({metabolite.id: -24.81})

for metabolite , stoich in BiomassPseudo.metabolites.items():
    print(metabolite.name, "\t", metabolite.id, "\t:", stoich)

# The growth associated maintenance (GAM), previously fitted for yeast to 59.276
# mmol/gDW10, comprises to a large extent polymerization costs, both for polymerizing
# aminoacids into proteins (16.965 mmol/gDW) and monosaccharides into polysaccharides
# (5.210 mmol/gDW). Because the composition of these two groups is changing in our model,
# the polymerization cost should change accordingly. Therefore, we fragment the model's
# GAM value into three amounts: one depending on the aminoacid composition, one
# depending on the carbohydrate composition and a third one to account for everything else,
# fitted manually for both aerobic and anaerobic conditions:
# GAMBb = 16.965 ∙ P5B: + 5.210 ∙ C5B: + GAM5_I [Eq. S26]
# After the fitting procedure, GAMfitted was equal to 31 mmol/gDW for aerobic conditions
# and 16 mmol/gDW for anaerobic conditions.
# • Finally, the non-growth associated maintenance (NGAM) is set constant at 0.7
# mmol/gDWh for aerobic conditions and 0 mmol/gDWh for anaerobic conditions. 

return model

`

VonAlphaBisZulu commented 1 year ago

In the code block you sent, it looks like oxygen uptake is knocked out permanently. So, you only search for anaerobic solutions, or you want O2-KO only as an option? If it should be only an option, you should keep the reaction open.

Could you generate plots of the model's flux spaces and the modules right before you launch the strain design computation (so on the model that you use for the computation)? My guess is that by knocking out O2 uptake, the feasible growth rate drops further. Perhaps, the protected region is too tight and does not allow for any additional knockouts. Beside plots, FVA is also a great tool for checking what's going on in the model and if there are new exchange reactions that became essential. Perhaps you find an answer there.

VonAlphaBisZulu commented 1 year ago

Since NGAM is now set to 0, it could be possible that all-0 now became part of the model's solution space. In this case - as mentionned in the StrainDesign manual - the 0-vector (all rates=0) must be excluded explicitly from the suppress-module, by an additional constraint. Such a constraint could be "-r1714 >= 1".

e.g.: module_suppress = sd.SDModule(model,sd.names.SUPPRESS,constraints=['r_1761 + 0.1 r_1714 <= 0','-r1714 >= 1'])

This means that substrate-uptake coupling is only enforced when uptake is above 1. You can lower this value, but I would advise so, only if you're unhappy with the returned solutions. Often the found solutions are identical, but the computation is slower and less numerically stable.

Exluding the 0-vector has purely mathematical reasons that arise from Farkas' Lemma, which is used here (without going more into detail).

tylerautera commented 1 year ago

Thank you @VonAlphaBisZulu for the feedback -- we will give this suggestions a try.

klamt-lab / straindesign

Chalmers GEM Yeast Model -- No Solutions #8