Closed mmundy42 closed 7 years ago
Uh nice. I have two things close to my heart ;)
One of the ways I've been thinking about implementing this was overwriting Model.media to be a property where we could set the getters and setters.
In the get method, I would imagine calling a function we write in cobra.manipulation which looks for boundary reactions that are not knocked out. We'll have to make some decisions on directionality... for instance, if a model has a reaction CO2 -->
with bounds (0, 1000), is that part of the media? If it is, we'll have big dictionaries to specify simple media, if its not, we'll have a tough time distinguishing between media where certain exchanges are 'knocked out'.
In the set method, I'd imagine just the reverse of the getter. Iterate over the dict-like object, maybe something like
{'EX_glc_e': (0, 1000),
'EX_h_e': (-1000, 1000),
}
and then set the bounds for each reaction as appropriate. In terms of storing media along with the model, I would think the best place to do this would be in the model.notes dictionary. Just have a key for media
, give each media dict a name, and then we could have model.media
's setter also accept string keys which would pull from the notes. so,
model.meda = 'glucose'
model.optimize()
model.media = 'xylose'
model.optimize()
would work, if model.notes
looked like
{'glucose': {'EX_glc_e': ...},
'xylose': {'EX_xyl_e': ...}}
@cdiener,
cobra.flux_analysis
? I'm not sure that needs to be a top-level model function unless it sees a lot of use. I could see it being really useful in model building / validation, for instance.@pstjohn
cobra.flux_analysis
seems fine.A few more considerations:
Lot's of good points here.
@mmundy42
@pstjohn
There is already a media_compositions
attribute in the Model class. Shouldn't we rather use that? For the the CO2
example I think we should not ignore the model bounds when assigning media. If the model specifies that a metabolite can not be imported it should not be imported in my book. For the API I would think of media as something that is applied temporarily to a model with a context manager and reset afterwards, like
with model.medium("LB"):
model.optimize()
with model.knockout("EX_glc_e").medium("SB"): # that would be even cooler
model.optimize()
Agreed that compartment handling might want to be its own package. I just don't see a clear path towards making that general enough for all cobra models.
On context managers, I'm 100% on board. One proposed solution was the TimeMachine
class from cameo, but I do wonder if there's a better way. (No offense to the cameo developers intended! Love that epic docstring 😄 ) My main complaint is that it just doesn't seem as straightforward as your mocked up examples.
I wonder if we could make cobra models copy-on-write... Then we could spin up new models whenever we want without too much concern. i.e., cobra.reaction.knock_out()
or similar could return a copy of the model with the reaction knocked out.
Otherwise we could define context manager functions like you're describing, where we implement something similar to TimeMachine under the hood. (__enter__
changes the bounds and remembers the previous state, __exit__
recalls the old state).
My point with CO2 was that if the media also needs to include bounds on secreted species, we'll need a term for every boundary species in every media. Unless we define a 'default' media, and subsequent media are just diffs from that? Might be tough to keep everything straight, I'd want the default to be tied to the bounds as defined when the model was loaded / saved.
Boundary metabolites can easily lead to bugs. They really don't make sense in a flux based model and shouldn't be used.
They are a big part of this issue: http://dx.doi.org/10.15252/msb.20156157
On Oct 28, 2016 9:04 AM, "Peter St. John" notifications@github.com wrote:
Agreed that compartment handling might want to be its own package. I just don't see a clear path towards making that general enough for all cobra models.
On context managers, I'm 100% on board. One proposed solution was the TimeMachine https://github.com/biosustain/cameo/blob/e4337c69b7bddf24148fbd5db5b579d301acf0a0/cameo/util.py#L301 class from cameo, but I do wonder if there's a better way. (No offense to the cameo developers intended! Love that epic docstring 😄 ) My main complaint is that it just doesn't seem as straightforward as your mocked up examples.
I wonder if we could make cobra models copy-on-write... Then we could spin up new models whenever we want without too much concern. i.e., cobra.reaction.knock_out() or similar could return a copy of the model with the reaction knocked out.
Otherwise we could define context manager functions like you're describing, where we implement something similar to TimeMachine under the hood. (enter changes the bounds and remembers the previous state, exit recalls the old state).
My point with CO2 was that if the media also needs to include bounds on secreted species, we'll need a term for every boundary species in every media. Unless we define a 'default' media, and subsequent media are just diffs from that? Might be tough to keep everything straight, I'd want the default to be tied to the bounds as defined when the model was loaded / saved.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/issues/303#issuecomment-256960437, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUPSKxT_XdQWfyy-rPUUDSSPjMC1v1Zks5q4h0LgaJpZM4KhuoX .
Great point, I'm fine with leaving it as reactions then
I think nobody want to use boundary metabolites. Cobra does not read them and should not. The question is whether we want a helper function to identify the media reactions based on a list of metabolites. Otherwise if I have a culture medium from the literature I would have to:
I think at least step 2 should have an option to be automatized, otherwise picking reaction out of a list of several thousand would be large source of error and bring us back to the reproducibility problem.
So I am lobbying for a:
model.media = medium_from_metabolites(metabs, bound=1000)
or something similar.
@pstjohn
Your __enter__
and __exit__
definitions are what I was thinking about. I also like the model.copy version but it might be performance hit if you call that kind of stuff in a loop like:
for r in many_reactions: knock_out(model, r).optimize()
I think I also prefer the if the context managers would be methods of the Model class since that would allow pandas style piping, e.g.
medium(knock_out(knock_out(model, "r1"), "r2"), "LB").optimize()
is uglier than
model.knock_out("r1").knock_out("r2").medium("LB").optimize()
IMHO. However, that would not work with context managers...
exactly, we couldn't rely on the existing copy machinery. I was thinking something along the lines of what was done in this library: https://github.com/diffoperator/pycow, but I dont know how well that would work with the nested structures that build up into a model. I.e., the copied (proxy) model would have to have its reactions point back to the base, but then when somethings changed it would have to spin up a new reaction. Or a proxy of the original reaction.
Anyways, it would be a cool way of forking models without much memory or computational overhead, which I think is the root of the issue we're looking at.
For enter and exit methods, I agree on avoiding the nested functional calls. An alternative could be to just define enter / exit methods on the model itself? I'm not 100% clear on if this would work..
with model:
model.medium = 'glucose'
model.reactions.my_favorite_reaction.knock_out()
stored_solution_somewhere = model.optimize()
where __enter__
remembers the current state, and __exit__
calls it back. (Or, starts to remember operations, and __exit__
undoes them)...
Exactly, something like pycow would be necessary to manage that kind of context manager.
The easiest thing to temporarily set the model self
reference to a copy but that does not work because you can set attributes of self
but not self
itself (wow that sounds funny). So we would really have to make copies of all attributes...
Maybe a "dumb" TimeMachine aka
from contextlib import contextmanager
@contextmanager
def tm(model):
yield model.copy()
with tm(model) as m:
m.reactions.mfr.knock_out()
m.medium = "glucose"
And add only some specific ones for media and multiple knockouts as
with model.medium("glucose"):
model.bla_bla()
with model.knock_out(("r1", "r2", ...)):
model.bla_bla()
Hi, @hredestig, do you think it would be good idea to open a feature branch for that and add @pstjohn and @mmundy42 with write access to the branch so we can play around with some options?
Just add @mmundy42 to cobrapy core. Either of us could open up the branch.
@cdiener, on the time-machine methods, I would guess that you could initialize some type of history-tracker and attach it to self on __enter__
. We'd have to re-write the bound manipulation functions to check if a model has an active history tracker, and if so, add the old bound and new bound to it. Then __exit__
could walk through the history, undoing each action.
Are reaction bounds the only thing we're tracking? If so it could be as easy as just doing an @property
on reaction.lower_bound
, and make the setter check reaction._model.{{ history }}
to see if we're in an active context...
On more careful thought the copy-on-write would be difficult to implement with the current organization of Models, Reactions, and Metabolites. The reason being is that not only to objects point down the hierarchy: Model -> Reaction -> Metabolite, but they also point up: metabolite.model keeps track of what model its a part of, and same with Reactions.
If we switched to a top-down only model, implemented perhaps with something like networkx, then we could do really fast shallow copies of a model's graph when calling model.copy. The problem here though is that you'd loose the ability to use methods like metabolite.reactions, since metabolites wouldn't know which reactions they're a part of, and reactions wouldn't know what models they're a part of. You could get around this partly by passing models as arguments, and partly by implementing methods at the next level up. (model.get_reactions(metabolite_id))
This is obviously a much bigger restructuring than just media... Has anyone thought about this type of organization flow?
@pstjohn I think in order to make the behavior understandable we would need to track all attributes. This would also include the solver references that will get introduced by optlang. It seems that this feature would generare a lot of boiler plate code and would also break everytime you add new attributes to Model
. So I would think that this should only get implemented if there is a lot of demand. For now, the model.copy
seems much simpler to me and does not really generate a lot of overhead (200 ms for very large models). I agree that this does not fix the forking problem, but maybe we can just implement some context managers for the most common workflows (like a bound resetter for knock_outs and media).
Very much agree that media should somewhere be a list of metabolites as that is what it actually is and media reactions something computed from the current model and this list of media constituents..
Could we possibly create a different issue with the purpose of discussing context managers, propose different alternatives for reversible changes to the model as it is not closely related to the media discussion? While the time-machine might not be perfect in all regards (also like method chaining and operating primarily on the model..), however, for refactoring methods from cameo to cobra there is quite some effort to be saved with at least having a compatible implementation since cameo makes heavy use of that class.
On new branch, indeed, you can just open branches as you like and @mmundy42 would you please request to join https://github.com/orgs/opencobra/teams/cobrapy-core if you like to contribute directly.
@hredestig, when I try the link I get a 404 error from github.com. Is that the right link or am I doing something wrong? Maybe I need to be invited to the organization first? https://help.github.com/articles/inviting-users-to-join-your-organization/
Closing with #350, lets open a new one for further discussions
I'd like to start a discussion on how to implement management of growth media. I can think of two approaches:
In either approach exchange reactions that are not in the input would have the bounds set to 0.
I also think there should be a way to save media to a file so they can be shared with others.
Thoughts? Other ideas?