opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
460 stars 216 forks source link

gapfilling optimization failed #781

Closed kellybotero closed 3 years ago

kellybotero commented 5 years ago

I am trying to do gapfilling of a plant genome scale metabolic model, which has the IDs and reactions with the kegg nomenclature. To fill the reactions in my model, I generated two models, one with all reactions of kegg and the other one with all the reactions of Arabidopsis reported in this database. When I ran the algorithm, the result I got for different objective functions is: Infeasible: gapfilling optimization failed (infeasible) I don´t know why this happens. Can anybody help me to understand this problem? Is it an error in the model or on the database? or simply the algorithm doesn´t find a solution?

Code Sample

solution = gapfill(model_test, model_DB, demand_reactions=False)

Actual Output

Traceback (most recent call last):

File "", line 1, in solution1 = gapfill(model_test, model_DB, demand_reactions=False)

File "/usr/local/lib/python3.6/dist-packages/cobra/flux_analysis/gapfilling.py", line 299, in gapfill return gapfiller.fill(iterations=iterations)

File "/usr/local/lib/python3.6/dist-packages/cobra/flux_analysis/gapfilling.py", line 223, in fill message='gapfilling optimization failed')

File "/usr/local/lib/python3.6/dist-packages/cobra/core/model.py", line 848, in slim_optimize assert_optimal(self, message)

File "/usr/local/lib/python3.6/dist-packages/cobra/util/solver.py", line 433, in assert_optimal raise exception_cls("{} ({})".format(message, status))

Infeasible: gapfilling optimization failed (infeasible)

Dependency Information

I am use python 3.6.6

gregmedlock commented 5 years ago

Hi Kelly, thanks for posting an informative problem description! There could be a number of reasons that attempting to gapfill returns an infeasible solution, but your intuition is correct--the algorithm cannot find a solution, even among all the reactions in your model_DB.

Thinking about the issues that might lead to an infeasible solution, here are some common ones:

  1. Generation of a metabolite is coupled to meeting the constraint on your former objective function, but the metabolite cannot leave the system. In other words, there might be a metabolite that is produced in a reaction essential for biomass production (assuming biomass was your objective), but there is no path for the metabolite to leave the system [e.g. path for transport -> boundary reaction]. You can debug this issue by setting demand_reactions = True; this will add a demand reaction for all metabolites, such that if the metabolite is blocked from leaving the system as I described, flux will be allowed through a demand reaction that carries the metabolite from whatever compartment it is in to the boundary (e.g. "Met_A --> "). If a feasible solution is found with this strategy, you can identify the specific demand reactions that needed to be added and investigate further (you may just need to add a transporter for a metabolite if there was no transport reaction in the model_DB). If you end up adding a transport reaction, make sure you also add an exchange reaction for that metabolite if there isn't one already. The only other comment I'll make on this issue is that exchange_reactions = True can also help by directly telling you whether the issue was exchange reaction absence (rather than something being blocked in an intracellular compartment), but you should be able to catch all these cases after using demand_reactions = True.
  2. If the strategies in issue 1 fail to find a feasible solution, It might not be possible to meet the gapfilling constraints with your model and model_DB (e.g. no matter what reactions you add, including demand and exchange reactions, some metabolite in your biomass reaction can't be produced). Debugging this issue would involve identifying the specific biomass precursor metabolites that can't be produced. If you are using the entirety of KEGG, and your model contains only metabolites and reactions from KEGG, I think this issue is unlikely. However, if you are manipulating IDs in your model in any way, things can go awry (e.g. if you modified the ID of a metabolite in your biomass function, that metabolite isn't in KEGG and therefore you might never be able to produce it).

I would try repeating the gapfilling with demand_reactions = True before going much further. Good luck!

kellybotero commented 5 years ago

Greg thanks for your answer. I took your suggestions into account. Unfortunately, the strategy 1 demand_reactions = True, fail to find a feasible solution. This happens using as objective function both the biomass reaction, as well as, simple reactions of synthesis of interest metabolites. To second strategy, I verified that all metabolites of my model are present in the model_DB. So, I think the error should be something else. To some objectives functions my model has not flux. Thus, is evident that my model has gaps. Is there something more that can I review to resolve my problem? Is there any option to identify the blocked reactions in the optimization? In case that gapfilling algorithm fails and not find a feasible solution.

Thanks.

gregmedlock commented 5 years ago

Did you give exchange_reactions = True a try? (if you haven't, do this with demand_reactions = True set as well)

I would recommend identifying the metabolites that can't be synthesized using the entirety of KEGG, then use search the Arabidopsis literature to see which reactions might be missing for particular functions. There are many ways to do this, but a straight-forward example would be:

1) adding the entire KEGG model to your Arabidopsis model, then 2) create a demand reaction for each biomass precursor in your arabidopsis biomass reaction (e.g. 'cpd01923_c -> 0', lb = 0, ub = 1000), then 3) iteratively set each demand reaction as the objective and try optimizing

Any demand reaction that returns 0 or infeasible is blocked and you will need to "hand-craft" a reaction that unblocks it, or add a boundary reaction that directly imports the metabolite (I wouldn't recommend this because things can go very wrong if the model is changed such that the metabolite serves as a precursor for other reactions).

You could also add these kinds of reactions to the gapfiller, heavily penalize them and attempt finding a feasible gapfill solution (but you are still left with finding the reaction missing from the database and creating it yourself).

If production of more than a handful of metabolites can't be done with the entirety of KEGG added and the exchange/demand issues considered above, I would suspect there is a technical issue with your model construction and/or manipulation that would require code/model sharing to resolve.

jclachance commented 4 years ago

I think all the explanations provided above are definitely key to make gapfilling work. Still, I tried using the gapfill function out of the box from COBRApy and while it worked for the example given in the documentation (deleting all reactions associated with f6p_c), it didn't work when deleting the reactions associated with another metabolite, say g3p_c.

I found a quick fix in the tutorial notebook 3 of the new paper by Norsigian et al.: A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. It simply requires changing the tolerance feasibility of the model itself. I think this is what the integer threshold should be doing but with the current implementation: 1- users can't access it with the gapfill function, 2- it isn't applied to the model. The best option right now I think is to change it before running gapfill:

model.solver.configuration.tolerances.feasibility = 1e-9
cdiener commented 3 years ago

This should have improved with https://github.com/opencobra/optlang/pull/231 :crossed_fingers: