opencobra / optlang

optlang - sympy based mathematical programming language
http://optlang.readthedocs.org/
Apache License 2.0
252 stars 52 forks source link

Optimisation is not deterministic even though the order of variables and constraints in the model is the same. #251

Open alcaravia opened 1 year ago

alcaravia commented 1 year ago

Problem description

I am working with metabolic models using cobrapy. I want to be able to recreate a specific optimal solution after reloading the metabolic model in a new script. I have managed to isolate one example which shows that if I have optimised the model with a set of bounds "bounds0" and then set different bounds "bounds1", the solution is different from when I load the model and set "bounds1" directly, which makes it impossible to recreate the first solution in a new script.

I am not 100% sure it is an optlang issue, but the order of constraints and variables is the same in both optlang models, which according to this issue should lead to identical solutions, but that is not the case whether I use the glpk or the gurobi solver, so the problem does not seem to lie with the solvers.

The optlang version in my environment is 1.6.1.

If the problem does not lie with optlang but with me, I apologise, any input is appreciated.

Code Sample

The two files needed for this to work are bounds.txt attached here and the metabolic model "Bacillus_cereus_AH187_F4810_72.xml" which can be found here (.xml files cannot be attached.)

The outputs are the optimal values of the two optimisations using identical bounds. They should be the same, but are not.

import cobra
cobra.Configuration().solver = "glpk"

# creating bounds0 and bounds1 dictionaries from file
bounds = open("bounds.txt").readlines()
bounds0 = {}
bounds1 = {}
for line in bounds:
    spl = line.strip().split(";")
    bounds0[spl[0]] = [float(spl[1]), float(spl[2])]
    bounds1[spl[0]] = [float(spl[3]), float(spl[4])]

# loading the model the first time
model0 = cobra.io.read_sbml_model("Bacillus_cereus_AH187_F4810_72.xml")

# setting the bounds according to bounds0 and optimising
for react in model0.reactions:
    react.lower_bound = bounds0[react.id][0]
    react.upper_bound = bounds0[react.id][1]
solution = model0.optimize()
# setting the bounds according to bounds1 and optimising
for react in model0.reactions:
    react.lower_bound = bounds1[react.id][0]
    react.upper_bound = bounds1[react.id][1]
solution = model0.optimize()

print(solution.objective_value)

# loading the model the second time
model0 = cobra.io.read_sbml_model(os.path.join("models_work_copy", f"Bacillus_cereus_AH187_F4810_72.xml"))

# directly setting bounds according to bounds1 and optimising
for react in model0.reactions:
    react.lower_bound = bounds1[react.id][0]
    react.upper_bound = bounds1[react.id][1]
solution = model0.optimize()

# tada, different optimal value. The fluxes are completely whack, too.
print(solution.objective_value)

Context

``` ```
cdiener commented 1 year ago

So the objective should definitley be unique within solver tolerances. So with your example it is deterministic for me though:

In [9]: import cobra
   ...: cobra.Configuration().solver = "glpk"
   ...: 
   ...: # creating bounds0 and bounds1 dictionaries from file
   ...: bounds = open("bounds.txt").readlines()
   ...: bounds0 = {}
   ...: bounds1 = {}
   ...: for line in bounds:
   ...:     spl = line.strip().split(";")
   ...:     bounds0[spl[0]] = [float(spl[1]), float(spl[2])]
   ...:     bounds1[spl[0]] = [float(spl[3]), float(spl[4])]
   ...: 
   ...: # loading the model the first time
   ...: model0 = cobra.io.read_sbml_model("Bacillus_cereus_AH187_F4810_72.xml")
   ...: 
   ...: # setting the bounds according to bounds0 and optimising
   ...: for react in model0.reactions:
   ...:     react.lower_bound = bounds0[react.id][0]
   ...:     react.upper_bound = bounds0[react.id][1]
   ...: solution = model0.optimize()
   ...: # setting the bounds according to bounds1 and optimising
   ...: for react in model0.reactions:
   ...:     react.lower_bound = bounds1[react.id][0]
   ...:     react.upper_bound = bounds1[react.id][1]
   ...: solution = model0.optimize()
   ...: 
   ...: print(solution.objective_value)
   ...: 
   ...: # loading the model the second time
   ...: model0 = cobra.io.read_sbml_model("Bacillus_cereus_AH187_F4810_72.xml")
   ...: 
   ...: # directly setting bounds according to bounds1 and optimising
   ...: for react in model0.reactions:
   ...:     react.lower_bound = bounds1[react.id][0]
   ...:     react.upper_bound = bounds1[react.id][1]
   ...: solution1 = model0.optimize()
   ...: 
   ...: # tada, different optimal value. The fluxes are completely whack, too.
   ...: print(solution1.objective_value)
6.622276006620644
6.622276006620505

Those two objective values are within the default solver tolerance. Fluxes differ a bit but this makes sense because there are many flux distributions that may yield the maximum objective and there are generally no guarantees which one will be returned. You can use pFBA if you want more unique fluxes.

Also, is there a reason that you are reading the model from two different locations?

alcaravia commented 1 year ago

Thank you for your quick reply! :)

To address the model location: I am reading the model from the same location, the folder "models_work_copy", I just forgot to delete the path the second time the model is loaded when I copied the code here.

I'm not entirely satisfied with the explanation that the two objective values are within a tolerance, because that is not the same as it being deterministic. I can load the model a hundred times and set the same bounds and always get the EXACT same objective value and flux values. On top of that, based on the objective values you are getting, your machine produces the EXACT same solution as mine. So the glpk algorithm is deterministic beyond this precision, which means there must be a "tangible" reason that it returns different values if the model has been optimised with different bounds before.

To add some context: I am simulating microbial communities in a chemostat until they reach an equilibrium, and in many cases, if the reconstructed solution is not exactly the same, the communities completely change and fall into a different equilibrium. This makes a lot of analyses and further simulations impossible, both for my model and presumably many other forms of dFBA.

I might try to use pFBA to see if that produces identical solutions, but given this issue with regular FBA, I am not confident it will solve my problem.

Midnighter commented 1 year ago

Hey,

Just as an idea, you might also look at an FVA with a fraction of 0.9999 or so of the optimum. That will give you an idea of the possible flux ranges when operating very close to the optimum.

alcaravia commented 1 year ago

I can do that, but I'm not sure I see how that fixes my problem. I'd know more about the solution space but still wouldn't know why I don't always get the same solution back. That's what a deterministic algorithm is supposed to do, no?

KristianJensen commented 1 year ago

Hi

What you are seeing is not a problem with determinism. As you say, if you rerun the same code you will get the same result, so the solver is deterministic. I think the "problem" here is that the model has state. The solver's algorithm, while deterministic, depends on its starting state. If you have already optimized the model, the solver will try to use the first solution to warm-start the search for the new solution. That means that you might reach a different solution than you would if you optimized "from scratch", although as mentioned above, these solutions are equally good. I don't know why your use case requires the second solution to be identical, but I hope this will at least explain what you are seeing :)

matthiaskoenig commented 1 year ago

@alcaravia The algorithm does not have to be deterministic. E.g. there could be a randomized selection of an initial simplex node for optimization. Or the last solution base/simplex node is reused to start the next optimization (then the solution depends on what your last optimization problem was and the previous state of the solver). If you have multiple optimal solutions (i.e. a paretro surface) it depends on the strategy of the solver to optimize the simplex problem. In no way does the simplex algorithm guarantee that you get the same solution from a paretro front.

alcaravia commented 1 year ago

@KristianJensen It does clarify some things, yes, thank you! Is there no way to force a solution "from scratch" instead of referring to a previous solution? The reason I want to be able to do this is what I stated above:

To add some context: I am simulating microbial communities in a chemostat until they reach an equilibrium, and in many cases, if the reconstructed solution is not exactly the same, the communities completely change and fall into a different equilibrium. This makes a lot of analyses and further simulations impossible, both for my model and presumably many other forms of dFBA.

@matthiaskoenig I understand that an optimisation algorithm does not have to be deterministic and that the optimal solution does not have to be unique, but as far as I could find out, the glpk solver specifically (as opposed to some gurobi methods) is designed to be deterministic, i.e., that there is no random component in the algorithm, which I was/am hoping would make an exact reproduction possible.

KristianJensen commented 1 year ago

It sounds like your system is underdetermined, and analysis/interpretation based on a particular equilibrium would probably be misleading if an unknown number of other equally valid but very different equilibria exist?

For ways to circumvent the issue, I think you could disable the warm-starting and/or "reset" the model by discarding any previously found solutions. These operations are not supported natively by optlang, so you would need to do it by manipulating the underlying solver instances.

E.g. Gurobi seems to have a reset method https://www.gurobi.com/documentation/9.5/refman/cpp_model_reset.html. Depending on how large your model is, it might be easier to just clone the optlang Model instance, which should also reset the solver, although it is not a very elegant approach