opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
461 stars 216 forks source link

design / implement better management of growth media #303

Closed mmundy42 closed 7 years ago

mmundy42 commented 7 years ago

I'd like to start a discussion on how to implement management of growth media. I can think of two approaches:

  1. Add a new method to cobra.Model that takes either a dictionary or a json file with the exchange reaction IDs and lower bound for the reaction. The method updates the exchange reactions in the model as specified in the input.
  2. Add a new cobra.Media object with methods to add and remove metabolites and set bound for metabolite. Add a new method to cobra.Model that takes a cobra.Media object that updates or adds exchange reactions in the model.

In either approach exchange reactions that are not in the input would have the bounds set to 0.

I also think there should be a way to save media to a file so they can be shared with others.

Thoughts? Other ideas?

cdiener commented 7 years ago

Uh nice. I have two things close to my heart ;)

  1. Most experimentalists and even theoreticians don't think about growth media in terms of reactions but rather in terms of metabolites. The formulation into reactions is convenient for us but I think we could go a step further and allow defining media in terms of metabolites together with bounds on the import of that metabolite. This would than be mapped to the respective boundary reactions automatically. Problems could arise with coupled transporters. We could ignore those imports for now.
  2. An analysis I have seen in a publication was to calculate the minimal medium for a model and I thought this was pretty neat. Basically there are two methods, both depending on a minimum growth threshold: use pFBA with a weighted (molecular weight?) sum of import fluxes and keep the ones larger than zero or define a MILP with an indicator for each import and minimize the sum of indicator to get the medium with the fewest components.
pstjohn commented 7 years ago

One of the ways I've been thinking about implementing this was overwriting Model.media to be a property where we could set the getters and setters.

In the get method, I would imagine calling a function we write in cobra.manipulation which looks for boundary reactions that are not knocked out. We'll have to make some decisions on directionality... for instance, if a model has a reaction CO2 --> with bounds (0, 1000), is that part of the media? If it is, we'll have big dictionaries to specify simple media, if its not, we'll have a tough time distinguishing between media where certain exchanges are 'knocked out'.

In the set method, I'd imagine just the reverse of the getter. Iterate over the dict-like object, maybe something like

{'EX_glc_e': (0, 1000),
 'EX_h_e': (-1000, 1000),
}

and then set the bounds for each reaction as appropriate. In terms of storing media along with the model, I would think the best place to do this would be in the model.notes dictionary. Just have a key for media, give each media dict a name, and then we could have model.media's setter also accept string keys which would pull from the notes. so,

model.meda = 'glucose'
model.optimize()
model.media = 'xylose'
model.optimize()

would work, if model.notes looked like

{'glucose': {'EX_glc_e': ...},
 'xylose': {'EX_xyl_e': ...}}

@cdiener,

  1. My only concern here is that when experimentalists think of metabolites over reactions, they're also thinking of concentrations over fluxes. Its a moot point for the bigg models where each of the exchange reactions are independent, (which I think should be encouraged... in SBML you draw the boundaries on metabolites correct? Does anyone have a good idea how that gets converted?). So yeah. I have mixed feels only because I don't want to mislead, but if its easy enough to implement I'm fine with labeling by metabolite over reaction.
  2. I think that's a great idea! But it should probably live somewhere in cobra.flux_analysis? I'm not sure that needs to be a top-level model function unless it sees a lot of use. I could see it being really useful in model building / validation, for instance.
cdiener commented 7 years ago

@pstjohn

  1. yeah I agree that's what I meant by coupled imports. I have no problems with media being stored as reactions. However, there should be a way to generate a putative set from just metabolites.
  2. I have no preference where that should live :) cobra.flux_analysis seems fine.
mmundy42 commented 7 years ago

A few more considerations:

  1. Some models from Thiele's lab have reactions that are boundary reactions in cobrapy but are not exchange reactions. For example, the B. theta model has 6 reactions that have only one metabolite but are not exchange reactions (DM_4HBA, DM_5DRIB, DM_AMOB, sink_chols, sink_hpyr, sink_s).
  2. I think there needs to be a way to store a media independent of a model. For example, we are running models for a large number of organisms from the human gut and need to apply the same media conditions to all of them.
  3. Media need to be independent of a compartment. For example, once we have a single species model working, we build a community model and need to apply the media to the community model which uses a different compartment at the boundary.
  4. There should be simple methods to add and remove a media to a model (even if under the covers it is stored in the model's notes).
cdiener commented 7 years ago

Lot's of good points here.

@mmundy42

  1. Those could be potential problem but only if the metabolite they use is part of the medium.
  2. and 3. We are also working with microbiome data, so I definitely understand your requirements. I think it will be hard to define media across models since different models may have different reactions. How would you propose that case should be handled. Point 3 is a pretty specific requirement for microbial community modeling. Maybe that should be in its own package.
  3. Definitely agree.

@pstjohn

There is already a media_compositions attribute in the Model class. Shouldn't we rather use that? For the the CO2 example I think we should not ignore the model bounds when assigning media. If the model specifies that a metabolite can not be imported it should not be imported in my book. For the API I would think of media as something that is applied temporarily to a model with a context manager and reset afterwards, like

with model.medium("LB"):
    model.optimize()

with model.knockout("EX_glc_e").medium("SB"):                 # that would be even cooler
    model.optimize()
pstjohn commented 7 years ago

Agreed that compartment handling might want to be its own package. I just don't see a clear path towards making that general enough for all cobra models.

On context managers, I'm 100% on board. One proposed solution was the TimeMachine class from cameo, but I do wonder if there's a better way. (No offense to the cameo developers intended! Love that epic docstring 😄 ) My main complaint is that it just doesn't seem as straightforward as your mocked up examples.

I wonder if we could make cobra models copy-on-write... Then we could spin up new models whenever we want without too much concern. i.e., cobra.reaction.knock_out() or similar could return a copy of the model with the reaction knocked out.

Otherwise we could define context manager functions like you're describing, where we implement something similar to TimeMachine under the hood. (__enter__ changes the bounds and remembers the previous state, __exit__ recalls the old state).

My point with CO2 was that if the media also needs to include bounds on secreted species, we'll need a term for every boundary species in every media. Unless we define a 'default' media, and subsequent media are just diffs from that? Might be tough to keep everything straight, I'd want the default to be tied to the bounds as defined when the model was loaded / saved.

aebrahim commented 7 years ago

Boundary metabolites can easily lead to bugs. They really don't make sense in a flux based model and shouldn't be used.

They are a big part of this issue: http://dx.doi.org/10.15252/msb.20156157

On Oct 28, 2016 9:04 AM, "Peter St. John" notifications@github.com wrote:

Agreed that compartment handling might want to be its own package. I just don't see a clear path towards making that general enough for all cobra models.

On context managers, I'm 100% on board. One proposed solution was the TimeMachine https://github.com/biosustain/cameo/blob/e4337c69b7bddf24148fbd5db5b579d301acf0a0/cameo/util.py#L301 class from cameo, but I do wonder if there's a better way. (No offense to the cameo developers intended! Love that epic docstring 😄 ) My main complaint is that it just doesn't seem as straightforward as your mocked up examples.

I wonder if we could make cobra models copy-on-write... Then we could spin up new models whenever we want without too much concern. i.e., cobra.reaction.knock_out() or similar could return a copy of the model with the reaction knocked out.

Otherwise we could define context manager functions like you're describing, where we implement something similar to TimeMachine under the hood. (enter changes the bounds and remembers the previous state, exit recalls the old state).

My point with CO2 was that if the media also needs to include bounds on secreted species, we'll need a term for every boundary species in every media. Unless we define a 'default' media, and subsequent media are just diffs from that? Might be tough to keep everything straight, I'd want the default to be tied to the bounds as defined when the model was loaded / saved.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/issues/303#issuecomment-256960437, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUPSKxT_XdQWfyy-rPUUDSSPjMC1v1Zks5q4h0LgaJpZM4KhuoX .

pstjohn commented 7 years ago

Great point, I'm fine with leaving it as reactions then

cdiener commented 7 years ago

I think nobody want to use boundary metabolites. Cobra does not read them and should not. The question is whether we want a helper function to identify the media reactions based on a list of metabolites. Otherwise if I have a culture medium from the literature I would have to:

  1. Map the ingredients to the metabolites in the model
  2. Pick all import reactions for each metabolite by hand.
  3. Define the bounds.

I think at least step 2 should have an option to be automatized, otherwise picking reaction out of a list of several thousand would be large source of error and bring us back to the reproducibility problem.

So I am lobbying for a:

model.media = medium_from_metabolites(metabs, bound=1000)

or something similar.

@pstjohn

Your __enter__ and __exit__ definitions are what I was thinking about. I also like the model.copy version but it might be performance hit if you call that kind of stuff in a loop like:

for r in many_reactions: knock_out(model, r).optimize()

I think I also prefer the if the context managers would be methods of the Model class since that would allow pandas style piping, e.g.

medium(knock_out(knock_out(model, "r1"), "r2"), "LB").optimize()

is uglier than

model.knock_out("r1").knock_out("r2").medium("LB").optimize()

IMHO. However, that would not work with context managers...

pstjohn commented 7 years ago

exactly, we couldn't rely on the existing copy machinery. I was thinking something along the lines of what was done in this library: https://github.com/diffoperator/pycow, but I dont know how well that would work with the nested structures that build up into a model. I.e., the copied (proxy) model would have to have its reactions point back to the base, but then when somethings changed it would have to spin up a new reaction. Or a proxy of the original reaction.

Anyways, it would be a cool way of forking models without much memory or computational overhead, which I think is the root of the issue we're looking at.

For enter and exit methods, I agree on avoiding the nested functional calls. An alternative could be to just define enter / exit methods on the model itself? I'm not 100% clear on if this would work..

with model:
    model.medium = 'glucose'
    model.reactions.my_favorite_reaction.knock_out()
    stored_solution_somewhere = model.optimize()

where __enter__ remembers the current state, and __exit__ calls it back. (Or, starts to remember operations, and __exit__ undoes them)...

cdiener commented 7 years ago

Exactly, something like pycow would be necessary to manage that kind of context manager. The easiest thing to temporarily set the model self reference to a copy but that does not work because you can set attributes of self but not self itself (wow that sounds funny). So we would really have to make copies of all attributes...

Maybe a "dumb" TimeMachine aka

from contextlib import contextmanager

@contextmanager
def tm(model):
    yield model.copy()

with tm(model) as m:
    m.reactions.mfr.knock_out()
    m.medium = "glucose"

And add only some specific ones for media and multiple knockouts as


with model.medium("glucose"):
    model.bla_bla()

with model.knock_out(("r1", "r2", ...)):
    model.bla_bla()
cdiener commented 7 years ago

Hi, @hredestig, do you think it would be good idea to open a feature branch for that and add @pstjohn and @mmundy42 with write access to the branch so we can play around with some options?

pstjohn commented 7 years ago

Just add @mmundy42 to cobrapy core. Either of us could open up the branch.

@cdiener, on the time-machine methods, I would guess that you could initialize some type of history-tracker and attach it to self on __enter__. We'd have to re-write the bound manipulation functions to check if a model has an active history tracker, and if so, add the old bound and new bound to it. Then __exit__ could walk through the history, undoing each action.

Are reaction bounds the only thing we're tracking? If so it could be as easy as just doing an @property on reaction.lower_bound, and make the setter check reaction._model.{{ history }} to see if we're in an active context...


On more careful thought the copy-on-write would be difficult to implement with the current organization of Models, Reactions, and Metabolites. The reason being is that not only to objects point down the hierarchy: Model -> Reaction -> Metabolite, but they also point up: metabolite.model keeps track of what model its a part of, and same with Reactions.

If we switched to a top-down only model, implemented perhaps with something like networkx, then we could do really fast shallow copies of a model's graph when calling model.copy. The problem here though is that you'd loose the ability to use methods like metabolite.reactions, since metabolites wouldn't know which reactions they're a part of, and reactions wouldn't know what models they're a part of. You could get around this partly by passing models as arguments, and partly by implementing methods at the next level up. (model.get_reactions(metabolite_id))

This is obviously a much bigger restructuring than just media... Has anyone thought about this type of organization flow?

cdiener commented 7 years ago

@pstjohn I think in order to make the behavior understandable we would need to track all attributes. This would also include the solver references that will get introduced by optlang. It seems that this feature would generare a lot of boiler plate code and would also break everytime you add new attributes to Model. So I would think that this should only get implemented if there is a lot of demand. For now, the model.copy seems much simpler to me and does not really generate a lot of overhead (200 ms for very large models). I agree that this does not fix the forking problem, but maybe we can just implement some context managers for the most common workflows (like a bound resetter for knock_outs and media).

hredestig commented 7 years ago

Very much agree that media should somewhere be a list of metabolites as that is what it actually is and media reactions something computed from the current model and this list of media constituents..

Could we possibly create a different issue with the purpose of discussing context managers, propose different alternatives for reversible changes to the model as it is not closely related to the media discussion? While the time-machine might not be perfect in all regards (also like method chaining and operating primarily on the model..), however, for refactoring methods from cameo to cobra there is quite some effort to be saved with at least having a compatible implementation since cameo makes heavy use of that class.

On new branch, indeed, you can just open branches as you like and @mmundy42 would you please request to join https://github.com/orgs/opencobra/teams/cobrapy-core if you like to contribute directly.

mmundy42 commented 7 years ago

@hredestig, when I try the link I get a 404 error from github.com. Is that the right link or am I doing something wrong? Maybe I need to be invited to the organization first? https://help.github.com/articles/inviting-users-to-join-your-organization/

pstjohn commented 7 years ago

Closing with #350, lets open a new one for further discussions