opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
467 stars 218 forks source link

Add CHRR method to sampling #669

Open Midnighter opened 6 years ago

Midnighter commented 6 years ago

@HAHerrmann a new issue seemed more fitting.

I have been using the ACHR sampler using cobra both in Matlab and Python. In Python when plotting the flux distributions it is obvious that the sampler gets stuck in a certain value which appears much more frequently than the others. This is not the case in Matlab. Do we know why this is?

Also, I really like the CHRR algorithm in Matlab and was hoping to use it using cobrapy. Will this algorithm ever be available in cobrapy also?

Thank you!

cdiener commented 6 years ago

Sorry for the delay. Could you provide the model you are sampling so I can check what is going on? CHRR has been asked for before and I have no problem adding but would need some more details concerning the method that was not in the paper. I will check if the Cobra toolbox has added the sources now.

HAHerrmann commented 6 years ago

Having the CHRR algorithm in cobrapy would be very much appreciated! Were you able to find the sources?

The model I was using for sampling was just a basic test network I created myself which sort of matches the structure of the larger network I was hoping to use:

Defining a simple test model

model = Model()
model.add_metabolites([Metabolite(i) for i in "ABCDEFGHIJK"])
model.add_reactions([Reaction(i) for i in ["ip", "R1", "R2", "R3", "R4", "R5", 
                                           "R6", "R7", "R8", "R9", "R10", 
                                           "R11", "R12", "R13","op"]])
model.reactions.ip.add_metabolites({"A": 1})
model.reactions.op.add_metabolites({"K": -1})
model.reactions.R1.add_metabolites({"A": -1, "B": 1})
model.reactions.R2.add_metabolites({"A": -1, "D": 1})
model.reactions.R3.add_metabolites({"B": -1, "D": -1, "C": 1})
model.reactions.R4.add_metabolites({"B": -1, "I": 1})
model.reactions.R5.add_metabolites({"C": -1, "E": 1})
model.reactions.R6.add_metabolites({"D": -1, "J": 1})
model.reactions.R7.add_metabolites({"E": -1, "F": 1})
model.reactions.R8.add_metabolites({"E": -1, "H": 1})
model.reactions.R9.add_metabolites({"F": -1, "G": 1})
model.reactions.R10.add_metabolites({"H": -1, "G": 1})
model.reactions.R11.add_metabolites({"J": -1, "E": -1, "I": -1, "G": -1, "K": 1})
model.reactions.R12.add_metabolites({"F": -1, "I": 1})
model.reactions.R13.add_metabolites({"H": -1, "J": 1})

Setting all reactions to be reversible

for r in ["ip","R1","R2","R3","R4","R5","R6","R7","R8","R9",
          "R10", "R11", "R12", "R13", "op"]:
    model.reactions.get_by_id(r).lower_bound = 0.
    model.reactions.get_by_id(r).upper_bound = 500.

Specifying the output function as the model objective

model.objective = 'op'

Setting "biological" input constraints

model.reactions.get_by_id('ip').lower_bound = 0.
model.reactions.get_by_id('ip').upper_bound = 75.
cdiener commented 6 years ago

Hi,

I think the problem you were seeing was related to some issues I just fixed. This affected models with only irreversible reactions and would lead to samples that were basically just the maximum values for some or all variables. I sampled your model with the fixes and it look pretty normal. For instance the op variable has the following distribution:

dist_op

Which more or less spans the entire range for that variable ([0, 12.5]).

Regarding the CHRR method the Cobra Toolbox team did not make the code for the polytope rounding public (which is the main point of that algorithm) and there seem to be no immediate plans to do so. When that changes I will definitely reconsider it.

Edit: This was incorrect, the code is public, see below.

Midnighter commented 6 years ago

@HAHerrmann I've just released cobrapy 0.11.3 including @cdiener's latest changes. Please give it a try.

HAHerrmann commented 6 years ago

Thank you! Much appreciated!

Just to clarify @cdiener , since the MATLAB cobra toolbox is open source, how can bits of the algorithm not be public?

Thanks again for your help, Helena

cdiener commented 6 years ago

@HAHerrman that is something you would have to ask them :smile: basically CHRR has to steps it first finds a "rounding" transformation for the sampling space and then samples from that transformed space. Only the second part is currently provided in the toolbox AFAIK. The polytope rounding is not and you need to pass the rounded polytope to the sampling step as an argument.

rmtfleming commented 6 years ago

The main aims of the paper on sampling genome-scale metabolic models with the CHRR algorithm were to uniformly sample with an algorithm guaranteed to sample uniformly, to accelerate the sampling procedure, and to enable uniform sampling of higher dimensional models than is feasible with the ACHR algorithm. The COBRA interface to the CHRR algorithm is provided here: https://github.com/opencobra/cobratoolbox/blob/master/src/analysis/sampling/CHRR/chrrSampler.m The CHRR algorithm itself is included in the COBRA toolbox via a submodule, and the rounding algorithm is here: https://github.com/Bounciness/Volume-and-Sampling/blob/1c7adfb46c2c01037e625db76ff00e73616441d4/preprocess.m Everything is open source and has been publicly available since 2017 when the paper was published. https://academic.oup.com/bioinformatics/article/33/11/1741/2964731

cdiener commented 6 years ago

Thank you Ronan. I apologize for the incorrect statement and have corrected it above.

Midnighter commented 6 years ago

@cdiener you mentioned in the past that you might still need sampling at your new job. Is CHRR something you would still like to implement or should we put this on hold?

cdiener commented 6 years ago

Hi sorry, I think I wasn't clear. I will still be using cobrapy but not sampling. However, I will still continue to maintain the existing sampling code.

The polytope rounding is pretty complex and would take a while to implement. Also it would add more computation to the init step which is slow as it is. I am also not convinced if it would really give a boost to sampling. It does converge faster for a fixed set of iterations but taking the slower init your amortized efficiency might not be that much higher. Creating many samples is pretty cheap so having faster convergence with less samples is maybe not that meaningful? So I would not take that on right now and definitely not alone...

TylerBackman commented 6 years ago

I just wanted to throw this out there- I have a strong interest in adding CHRR to cobrapy, and we plan to work on implementing this soon (perhaps within the next year). If anyone is interested in collaborating on this please let me know.

Midnighter commented 6 years ago

Okay, then I'll keep this issue open. I was doing a bit of house cleaning :smiley: