opencobra / schema

xml/rdf schemas for annotating cobra models
Apache License 2.0
2 stars 1 forks source link

SBML ID to Model ID #6

Open tpfau opened 6 years ago

tpfau commented 6 years ago

Hi, I'm not entirely sure, if this is the right place to discuss this. if not it would be great if someone could point me to the appropriate place.

I'm currently thinking about SBML IO and SBML Identifiers. At the moment, in the COBRA Toolbox we have the following convention: We convert from internal ID to SBMLID by replacing all elements that are not a-zA-Z0-9_ by __Num__ where Num is the string representing the original char cast to an integer. We further check, whether there are any ids which start with a number. If those exist, we append a marker in the beginning depending on the type of id (M for metabolites, R for reactions, C for compartments and G for genes to ALL IDs of that type, i.e. we assume, that when reading a file that has all starting with this expression, those are files generated by us).

Metabolites in the matlab structure commonly contain the compartment as part of the name. (e.g. atp[c]), and this is actually the only place, where the compartment is stored. So when exporting, we extract the info and put it into compartments. The identifier is completely kept, i.e. converted to "atp91c91", as we have to ensure, that the same compound in different compartments does not get the same id.

One more peculirarity: We currently have the following concept for GeneProducts: id is defined as above (with a conversion of what is in model.genes) label is the gene ID itself as stored in our model.genes field (this is commonly some database identifier). name is storing protein names for a specific gene product.

What I'm wondering: how is cobraPy doing this? And are there suggestions/ideas how this "should" be done?

draeger commented 6 years ago

@tpfau, do I understand your message right that you would prefer if an FBC Version 3 package could be developed that incorporates

Please note that the SpeciesType is already awailable and should not be duplicated. You can find it in the multi package. Since here only very limited types are needed, in this szenario sboTerms could already be sufficient as indicators.

tpfau commented 6 years ago

@draeger Yes, I think it would make sense to extend the fbc package. As for the details:

draeger commented 6 years ago

@tpfau, thanks for the clarification. I'd be careful with default values. First because it is then often hard to explicitly say "we don't know the value (yet)" and second it introduces certain assumptions that implementers need to know and that can be easily overlooked in the specification. It is better to make values explict.

tpfau commented 6 years ago

@draeger True, even though I'd claim in this instance, that if its not set, this should be a normal "metabolite", and thus it has to be true.

tpfau commented 6 years ago

About names of the b and csense fields: I'm actually not sure, whether renaming them is a good idea and input from other modellers would be good here. The reason is quite simple: These are values which are directly linked to the mathematical representation of a model S * v correspondsto b (where correspondsto can be any of <=, >= or =). At least for b its the name that is used in most publications and almost all other literature. csense is the name 'we' gave it as it is the constraint sense (considering each species/metabolite) as a constraint, but is again very much linked to the mathematical representation. So I would really like to hear additional thoughts on this before any name is given... Maybe @Midnighter could contribute here.

draeger commented 6 years ago

Now that we have developed concrete ideas what could be helpful, we should find best ways to move towards their manifestation in some specification. It seems to me, the best would be to incorporate these relatively few changes into a new version of the FBC package for SBML L3V1R2. It was also requested earlier to allow floating-point values for charges. This development could even happen at the same time.

@tpfau: Do you think that a new package FBC Version 3 would be a better solution than creating a COBRA annotation schema? If this is the case, we should reach out to the package working group and figure out the next steps.

Otherwise, I would suggest, we move on and design a minimal COBRA annotation scheme that covers your information. It seems to me we are getting close to a solution. What do other people think, @rmtfleming, @matthiaskoenig, @mhucka, @Midnighter?

tpfau commented 6 years ago

@draeger: Personally I think that yes, an update to the FBC package would be the better solution, as this would bundle COBRA specific information into a SBML package instead of splitting it again into being partially distributed into fbc and a non SBML based annotation scheme. Mainly because the latter tends to again create different "versions" which are not generally supported, while the fbc package has become kind of a standard tool now.

I would be happy to help there (if thats possible/desired).

One issue though is how quick such an update would happen. Given that we are only talking about additional attributes I assume this would not be too difficult to implement and the most effort would have to be put into the corresponding documentation.

Midnighter commented 6 years ago

I'll get a sounding from the other cobrapy devs and then write back here. Overall, if we can get the changes integrated in FBC rather than having to maintain our own schema additions I'd be very happy.

matthiaskoenig commented 6 years ago

Personally I think the mayor limitation with the fbc package is currently that arbitrary linear constraints can not be repesented. There should be just an addition to the fbc package which makes it possible to write such additional constraints for the LP problem (which are not flux bounds or due to the stoichiometric matrix).

This should cover general linear constraints like c1 v1 + .... ck vk <= 0 (with the different directions) Many groups and FBA variants are needing such constraints and these could be nicely translated to the LP problem like for instance via optlang interface in cobrapy or respective code in COBRA.

The b and csense constraints are just a subset of this general problem of defining arbitrary LP constraints and should be handled by a general solution, not by b and csense fields or attributes.

On Wed, Nov 15, 2017 at 9:56 AM, Moritz E. Beber notifications@github.com wrote:

I'll get a sounding from the other cobrapy devs and then write back here. Overall, if we can get the changes integrated in FBC rather than having to maintain our own schema additions I'd be very happy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencobra/schema/issues/6#issuecomment-344527081, or mute the thread https://github.com/notifications/unsubscribe-auth/AA29usYW_n8DG1lO3dK3xD3aGpzovwQeks5s2qdAgaJpZM4P7x1d .

-- Dr. Matthias König Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt-University Berlin, Institute of Biology, Institute for Theoretical Biology https://www.livermetabolism.com konigmatt@googlemail.com Tel: +49 30 20938450 Tel: +49 176 81168480

draeger commented 6 years ago

I agree to @matthiaskoenig. If creating FBC V3, this should go in as well.

tpfau commented 6 years ago

@matthiaskoenig True, and yes, this is essentially what csense and b would be supposed to do, while the isSteadyStateConstraint flag on a metabolite was supposed to be an indicator whether the species is an arbitrary constraint, or a stoichiometry derived one and thus a metabolite. Thinking about it, yes it is probably not the best idea, to code linear constraints as additional metabolites, as this could lead to wrong constraints being assumed by a tool that does not interpret fbc information. However: There needs to be some indicator also on normal metabolites about csense and b, as there are methods like MD-FBA which require an accumulation of metabolites, and thus a non zero value, i.e. it contradicts the S*v = 0 constraint. If fbc just provides methodology for additional Linear constraints, those constraints would always lead to an infeasible Problem, as the basic S*v = 0 constraint would still be in (derived from the Stoichiometry), while an additional constraint would say S * v > eps.

In the end, I think there are two designs, which have their pro's and cons:

  1. Adding Linear constraints as individual entities. This would require creation of links between reactions and these constraints, and still require additional fields in metabolites (as indicated above). To have a full separation, it would also require the addition of an artificial reaction entity, which is not part of the reactions, if e.g. additional variables should be encoded (e.g. a MILP, with indicators). Overall this should lead to a full representation of a Linear (or quadratic) problem structure, with different variable types (continous, binary, integer) being possible. The benefit would be, that this is completely decoupled, so if a tool does not interpret it, it entirely misses these constraints, and works with the basic stoichiometric model.
  2. Only adding addition of attributes to reactions/metabolites to indicate that they are "not normal" e.g. reaction attributes variableType ( Cont, Binary, etc), isStoichiometricReaction (distinguish between stoichiometry and artificial variables), and similar fields for metabolites. The benefit here is a simple implementation (on both parser and library side) as it only introduces attributes. Admittedly, this could lead to a semi-interpretation of constraints if the fbc-data is not interpreted.

Personally, I'm not sure, which to prefer.

skeating commented 6 years ago

Hi Guys

If you are thinking that this would be fbc v3 then you do need to move that discussion to the fbc list

sbml-flux@lists.sourceforge.net

:-)

Sarah

On 16/11/2017 05:45, Thomas Pfau wrote:

@matthiaskoenig https://github.com/matthiaskoenig True, and yes, this is essentially what csense and b would be supposed to do, while the |isSteadyStateConstraint| flag on a metabolite was supposed to be an indicator whether the species is an arbitrary constraint, or a stoichiometry derived one and thus a metabolite. Thinking about it, yes it is probably not the best idea, to code linear constraints as additional metabolites, as this could lead to wrong constraints being assumed by a tool that does not interpret |fbc| information. However: There needs to be some indicator also on normal metabolites about |csense| and |b|, as there are methods like MD-FBA which require an accumulation of metabolites, and thus a non zero value, i.e. it contradicts the |Sv = 0| constraint. If |fbc| just provides methodology for additional Linear constraints, those constraints would always lead to an infeasible Problem, as the basic |Sv = 0| constraint would still be in (derived from the Stoichiometry), while an additional constraint would say |S * v > eps|.

In the end, I think there are two designs, which have their pro's and cons:

  1. Adding Linear constraints as individual entities. This would require creation of links between reactions and these constraints, and still require additional fields in metabolites (as indicated above). To have a full separation, it would also require the addition of an artificial reaction entity, which is not part of the reactions, if e.g. additional variables should be encoded (e.g. a MILP, with indicators). Overall this should lead to a full representation of a Linear (or quadratic) problem structure, with different variable types (continous, binary, integer) being possible. The benefit would be, that this is completely decoupled, so if a tool does not interpret it, it entirely misses these constraints, and works with the basic stoichiometric model.
  2. Only adding addition of attributes to reactions/metabolites to indicate that they are "not normal" e.g. reaction attributes |variableType| ( Cont, Binary, etc), |isStoichiometricReaction| (distinguish between stoichiometry and artificial variables), and similar fields for metabolites. The benefit here is a simple implementation (on both parser and library side) as it only introduces attributes. Admittedly, this could lead to a semi-interpretation of constraints if the fbc-data is not interpreted.

Personally, I'm not sure, which to prefer.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/opencobra/schema/issues/6#issuecomment-344822367, or mute the thread https://github.com/notifications/unsubscribe-auth/ABp_bqajUBPVDgzI0Nuvh09eilpb2H5aks5s28v-gaJpZM4P7x1d.

draeger commented 6 years ago

Thanks, @skeating; this is indeed a good idea. We should soon move to this list to get more people involved. If you haven't subscribed to this list, please do so at https://sourceforge.net/projects/sbml/lists/sbml-flux/. I will post a summary of this thread on this list.

draeger commented 6 years ago

I posted a summary to sbml-flux@lists.sourceforge.net and now hope to see an active discussion of the community.