opencobra / schema

xml/rdf schemas for annotating cobra models
Apache License 2.0
2 stars 1 forks source link

SBML -> General COBRA Constraints #10

Open tpfau opened 7 years ago

tpfau commented 7 years ago

This si a spin of from #6 for the additions of general linear constraints of the form: C * v relation d Where C is a matrix with columns corresponding to model reactions and rows corresponding to individual constraints. v is the solution flux vector, d corresponds to right hand side values. and relations can be lower than, equal or larger than relations.

This is a complex topic, as this might also include the addition of non reaction variables (e.g. protein usage), indication of variable types (continous (LP) vs discrete (MILP) vs quadratic(QP)), and general right hand sides for the common S*v = 0 constraint (e.g. enforced production) along with relation indicators for normal metabolites..

Current suggestions: A: Using the Constraint field from SBML B: Implementing COBRA specific fields in the fbc package to match these requirements.

Midnighter commented 7 years ago

@coltonlloyd have you thought about how to best encode the ME-model formulation already? Do you have an opinion here?

Same for @BenjaSanchez regarding GECKO, any opinions on how to best encode those formulations directly in SBML?

coltonlloyd commented 7 years ago

@Midnighter we're in the very early stages of working with Andreas Drager on representing the ME-model in SBML. Right now we were thinking the best approach would be to develop a ME-extension of the FBCv2 package. I can let you know more once the project is farther along.

draeger commented 7 years ago

Matrices are not so straightforward to represent in XML because this would require a table structure. My understanding is that the constraint matrix C might be rather sparse (similar to the stoichiometric matrix). If this is the case, I think, a list of constraints is probably better and less error-prone than explicity writing the full constriant matrix.

Personally, would prefer reusing the existing Constraint class from SBML and extending it as needed rather than introducing another datatype, but that's just my opinion. There were already discussions of using Constraint for actual constraint-based models in the past. As this construct is defined now, it only displays a message to users when a condition (a math statement) becomes false.

BenjaSanchez commented 7 years ago

@Midnighter in GECKO this is solved by adding the new species (in my case enzymes) directly to the corresponding reactions. Kinetic data 1/kcat is included as stoichiometric coefficients, and concentrations P of proteins are included as UB of the corresponding exchange reactions (referred to as enzyme usage).

More generically to the case @tpfau mentions, as long as the relations are linear, any new constraint could be included by just adding extra metabolites/reactions, and then constraining the LB/UB of the new reactions. Not the most elegant solution, but you don't loose compatibility in any way with SBML.

tpfau commented 7 years ago

@BenjaSanchez : While this works for GECKO, where these represent enzymes, which actually make sense to encode as SBML species, it gets problematic for general coupling constraints, where you want to indicate, that a reaction A should not have more flux than 2 * reaction B. This "coupling" metabolite, does not have a direct biological correspondance, and is thus not actually a species. It also present the problem of how to distinguish these enzymes from metabolites, when e.g. looking into mass balancing or other stoichiometric methods, which would flag those metabolites as imbalanced/problematic and likely also indicate the involved reactions as imbalanced.

Also as you mention, you currently need to add exchange reactions (which again are kind of disconnected from the remaining network) and add upper and lower bounds to these reactions, but these aren't those reactions essentially just creating left hand and right hand sides for the constraint derived from the enzyme?

BenjaSanchez commented 6 years ago

@tpfau the purpose of those exchange reactions in GECKO is to limit the overall usage of a given enzyme that might be used by more than one reaction (promiscuity) and/or to limit separately each sub-unit of any given complex, so that the limitation arises from the least abundant subunit. If there were no complexes or promiscuity in the model then the exchange reactions would be redundant and one could directly constrain the reactions with kcat*conc, but this is not the case usually.

tpfau commented 6 years ago

@BenjaSanchez True. What I mean is the following: What currently happens is that you have a protein representation say Enz1. For this you add an exchanger with ExEnz1: " -> Enz1", and set the lower bound of ExEnz1 to 0 ad the upper bound to kcat*conc. In addition Enz1 is added as a substrate to all reactions using the Enzyme and to all "Complex forming reactions" that need the Enz1 to form a complex. From an LP point of view: You add a Variable 'ExEnz1' which has the constraints -ExEnz1 <= 0, ExEnz1 <= kcat*conc and ExEnz1 - sum(Flux_through_Reactions_using_Enz1) = 0 From an LP point of view, this is the same as adding the following constraints: -kcat*conc <= -sum(Flux_through_Reactions_using_Enz1) <= 0 Internally whichever solver you use will probably reformulate the problem either way (I don't know which one is easier).

Admittedly, the thing that can't be encoded with Constraints is the formation of the complexes because yes, for this you really do need additional variables. This is indeed something we should think about.

mhucka commented 6 years ago

Hello everyone -- it's great that people are discussing these issues. However, I found recently that I'm not the only one having trouble trying to figure out the state of things now that discussions are split across multiple Github issues and at least one mailing list (sbml-flux). I'm not sure what to suggest as a solution, but my sense is that the older people involved tend to prefer mailing lists over writing comments on potentially-overlapping github issues, and also, that not all groups are currently involved in the conversation. Perhaps it would be worth considering finding a list to use (or creating a new, temporary Google group), and trying to encourage other interested parties to get involved?