opencobra / schema

xml/rdf schemas for annotating cobra models
Apache License 2.0
2 stars 1 forks source link

Non-integer charge #3

Open Midnighter opened 7 years ago

Midnighter commented 7 years ago

From @zakandrewking on June 28, 2017 22:4

Hi All:

I hope this is not a rehash of a previous conversation, but I could not find it, so here we go :)

Can we support non-integer charges in SBML / FBC? Right now in sbml3.py non-integer charges will throw an error. There are use cases for floating point charges when model metabolites represent a group of metabolites.

Thanks!!

Zak

Copied from original issue: opencobra/cobrapy#534

Midnighter commented 7 years ago

From @zakandrewking on June 28, 2017 22:5

The same issue is relevant for JSON export. Charge is an integer in the json-schema.

Midnighter commented 7 years ago

From @cdiener on June 30, 2017 16:40

Unfortunately the SBML 3 FBC 2 spec only allows integer charges (http://sbml.org/Special/specifications/sbml-level-3/version-1/fbc/sbml-fbc-version-2-release-1.pdf). So we would be breaking the spec if we read non-integer ones.

Maybe you could work around it by writing it into the annotations which are parsed by cobrapy and follow that by resetting the charges to whatever was specified there...

For the JSON schema I don't see a problem...

Midnighter commented 7 years ago

From @cdiener on July 5, 2017 16:43

I did some reading on the SBMl spec and what I wrote earlier about using annotations is wrong :cry: They don't allow you to attach actual values to anything, it can only be a link to something from identifiers.org, so that won't work. Notes couñd be an alternative but the parser in cobrapy does not read them since SBML specifies those to be something that should be consumed by a human and not by a computer, so the previous interpretation by the cobrapy team was to simply not read those at all. Unfortunately, those are limitations with SBML itself...

So I think the only solution for SBML would be to have an external table of metabolite ids to charge than update the charge in the cobrapy model after reading it...

Midnighter commented 7 years ago

From @zakandrewking on July 5, 2017 21:9

As I look more into this, I am thinking we should require integer charges all the time and make COBRApy users go along. Especially if SBML is strict about it.

Would it be reasonable to add a type check in COBRApy so users cannot set float-valued charges?

Midnighter commented 7 years ago

From @cdiener on July 6, 2017 15:25

I mean your initial argument made sense to me :grin: Groups of metabolites might be a thing and partial charges are as well... We could allow it in cobrapy and only disallow it when reading or writing SBML. Might be worthwhile asking the SBML maintainers why they chose to only allow integers...

Midnighter commented 7 years ago

From @zakandrewking on July 6, 2017 17:35

yes, i am checking with @draeger on that point

Midnighter commented 7 years ago

From @zakandrewking on July 6, 2017 19:10

Whatever the final approach, it should probably apply to formula/elements as well.

The checks could also be in biosustain/memote (@ChristianLieven)

Midnighter commented 7 years ago

From @draeger on July 7, 2017 7:37

@cdiener: First, you're right that the FBC specification (unfortunately) restricts the charge attribute to integer values only. Breaking this and simply writing doubles or floats into the XML file could result in parse errors in some third-party tools. So these are the bad news. However, there seems to be a misunderstanding about annotations and notes that I like to clarify.

Notes are not the right place to store any information that is relevant for computational tools. Notes in SBML are like comments in programing languages. You don't want to parse information out from there. This was done in the past and caused a lot of problems. People are still trying to write converters to get rid of repurposed notes in SBML.

Annotations are intended to be interpretable machine-readable additional information. A recommended use case is described in the various publications about MIRIAM and identifiers.org, but you can write any information you want into these annotations given that you define your own custom namespace for it. There could be a COBRApy namespace, e.g., http://sbrg.ucsd.edu/tools/cobra (or whatever) and then you could even define custom XML tags within that namespace as long as you use the prefix defined for this namespace, e.g., cobra:. So within the annotation you could have a tag <cobra:charge>-4.5</cobra:charge>.

Other tools also use that annotation mechanism. The most prominent one is probably CellDesigner. They provide an elaborate documentation about how they store their additional information in SBML in a specification-compliant way, see http://www.celldesigner.org/documents/CellDesigner4ExtensionTagSpecificationE.pdf

Midnighter commented 7 years ago

That's very relevant for #541 as well, thanks for the comment.

Midnighter commented 7 years ago

From @ChristianLieven on July 7, 2017 12:9

This is related to my comments in #541. @draeger What do you suggest would be the best way of linking let's say a comment to a citation in the form of a DOI? If annotations are supposed to be machine-readable only while the notes are supposed to be human-readable only, a comment or data-point and the corresponding citation, could not live in the same XML compartment. In my opinion this disconnection leads to a loss of valuable annotation information and to a degree this applies to this case too <cobra:cs>2</cobra:cs>, where cs stands for confidence score. Frankly, I don't know the SMBL specifications/ XML format well, but is something like this possible: <cobra:cs reference="MIRIAM_formatted_DOI">3</cobra:cs>

Midnighter commented 7 years ago

From @matthiaskoenig on July 7, 2017 12:38

Yes you can define your own xml namespace and write this in the annotation. Like for instance

But if you do so, you should clearly define the xml schema you use, document it, version it and provide long-terms support for it. Otherwise it will be the same problems like with the current cobra annotations which change all the time, consequently making it very difficult to support them in the long run or convert them to other formats.

Is there any document describing the current version of the cobra annotation tags? Is there any versioning? What is the community decision on the allowed/supported cobra annotations?

On Fri, Jul 7, 2017 at 2:09 PM, Christian Lieven notifications@github.com wrote:

This is related to my question in #541 https://github.com/opencobra/cobrapy/issues/541. @draeger https://github.com/draeger What do you suggest would be the best way of linking let's say a comment to a citation in the form of a DOI? If annotations are supposed to be machine-readable only while the notes are supposed to be human-readable only, a comment or data-point and the corresponding citation, could not live in the same XML compartment. In my opinion this disconnection leads to a loss of valuable annotation information and to a degree this applies to this case too

2, where cs stands for confidence score. Frankly, I don't know the SMBL specifications/ XML format well, but is something like this possible: 3 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or mute the thread .

-- Dr. Matthias König Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt-University Berlin, Institute of Biology, Institute for Theoretical Biology https://www.livermetabolism.com konigmatt@googlemail.com Tel: +49 30 20938450 Tel: +49 176 81168480

Midnighter commented 7 years ago

@hredestig and I just had a brief discussion on providing such an XML schema.

Is there any document describing the current version of the cobra annotation tags? Is there any versioning? What is the community decision on the allowed/supported cobra annotations?

We are not aware of any existing schema or documentation of the annotation tags used in cobra. Our suggestion is to create a new repository under the opencobra organization. That way, any member of the opencobra community (most importantly of the Matlab COBRA Toolbox) can feel free to contribute to the schema, there can be versioned releases of the schema, and for the time being it can be hosted on https://opencobra.github.io/annotations/schema or whatever is decided for the name and URL.

We would then implement in cobrapy whatever is dictated by the schema and there's a chance for other tools in the opencobra community to do the same.

Midnighter commented 7 years ago

From @matthiaskoenig on July 7, 2017 13:49

@Midnighter and @hredestig That sounds great and the right way forward. It would already be a big step to have such an annotation schema and have it adopted by cobra and cobrapy.

Information which is encodable in SBML like GPRs (gene protein relations) or flux bounds should be in the respective SBML elements and NOT in annotations. Only annotations information which is not covered by SBML should go in the cobra annotation tags, like for instance non-integer charges or confidence annotations. Things which are "classical" annotations to databases like for instance to UniProt, KEGG, ... belong in the RDF annotations! Lots of things can already be covered nicely in SBML and should be handled in SBML, for instance the 'SUBSYSTEM' annotation should not be used, but be treated via SBML groups (or if used at least already be encoded via a SBML group).

Midnighter commented 7 years ago

From @cdiener on July 7, 2017 14:33

@draeger thanks so much for your reply. I did not know about the custom annotation scheme and it looks like it is just what we need in order to attach data to SBML elements :+1:

Midnighter commented 7 years ago

From @cdiener on July 7, 2017 15:5

I suggest we create a new issue for the cobra xml namespace and link the relevant issues since there are quite a few.

Midnighter commented 7 years ago

From @draeger on July 7, 2017 15:24

@cdiener Yes, but as pointed out by @matthiaskoenig it is important to have the COBRA-specific extra information part as small as possible and to use as much as possible existing constructs, attributes etc. from SBML in order to facilitate reuse and maintenance of the files on the long term.

Midnighter commented 7 years ago

From @draeger on July 7, 2017 15:26

In fact, the package working group of the flux-balance constraints package (FBC) should be involved in this process (sbml-flux@lists.sourceforge.net), because in the best case there will be a new version of the fbc extension package for SBML that has the missing features or allows for double-valued charges etc.

Midnighter commented 7 years ago

From @cdiener on July 7, 2017 16:53

@draeger sure, do you have a link to any other software that reads a SBML group like @matthiaskoenig proposed. AFAIK none of the cobra packages support that...

Midnighter commented 7 years ago

From @draeger on July 10, 2017 7:29

I didn't test it in COBRApy, but at least the developers agreed on using groups of reactions to annotate the subsystem property, for which there is no corresponding field in fbc. Tools that can work with that are, for instance, https://github.com/SBRG/ModelPolisher, http://apps.cytoscape.org/apps/cy3sbml, and some others. BiGG database delivers its models with groups.

It is actually a very simple package and easy to use. It just defines a group object that has a list of members. Each member can point to arbitrary ids within the SBML file, such as reaction ids.

Midnighter commented 7 years ago

From @cdiener on July 10, 2017 17:34

Okay so we should support it as well. There are some potential quirks since SBML groups can refer to things that are not supported by cobrapy like rule ids. Also the potential for circular references since they can refer to group ids as well (the spec prohibits that but we would need to implement a check). Maybe for the beginning we would only parse groups that link to species or reaction ids...

Midnighter commented 7 years ago

From @matthiaskoenig on July 10, 2017 19:56

The problem with SUBSYSTEM is that there is exactly on subsystem per reaction, whereas with groups one can put reactions (and also metabolites) in multiple groups. Some reactions are just multi-pathway enzymes and belong in multiple pathways. With the current cobra SUBSYSTEM this is not working. People creating pathways based on subsystem annotation always have disconnected pathways with missing reactions.

Example is glycolysis, which also contains pentose phospate pathway in most cobra subsystem annotations with some reactions in the ER (mainly NADPH reactions). The transporters for glucose-6-phoshate in ER are not annotated with subsystems resulting in disconnected graphs. Also species are not annotated to pathways with cobra subsystem (because only for reaction), which is easily possible with groups.

IMHO subsystem annotation is the inferior solution not fullfilling its function of annotating pathways (subsystems), whereas groups is perfect for this.

On Mon, Jul 10, 2017 at 7:34 PM, Christian Diener notifications@github.com wrote:

Okay so we should support it as well. There are some potential quirks since SBML groups can refer to things that are not supported by cobrapy like rule ids. Also the potential for circular references since they can refer to group ids as well. Maybe for the beginning we would only parse groups that link to species or reaction ids...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencobra/cobrapy/issues/534#issuecomment-314178578, or mute the thread https://github.com/notifications/unsubscribe-auth/AA29ur29Xqef8igV4uzJ_jR20T6LRHMlks5sMmCxgaJpZM4OIl6Z .

-- Dr. Matthias König Junior Group Leader LiSyM - Systems Medicine of the Liver Humboldt-University Berlin, Institute of Biology, Institute for Theoretical Biology https://www.livermetabolism.com konigmatt@googlemail.com Tel: +49 30 20938450 Tel: +49 176 81168480

Midnighter commented 7 years ago

From @mhucka on July 12, 2017 23:40

Thanks to everyone involved in this discussion. It is really great to see!

The original question seemed to imply a need for non-integer charges in the SBML FBC v.2 spec. Did people decide that this is indeed a problem that should be addressed in an update to the definition of SBML FBC? Should this be brought up with the SBML FBC working group?

Midnighter commented 7 years ago

From @cdiener on July 13, 2017 16:30

@mhucka I think it would be discuss it with the FBC groups as well. We are also in the process of designing a cobra namespace for annotations and maybe the requirements popping up there would be worth discussing with the FBC group as well.

Midnighter commented 7 years ago

From @bgoli on July 14, 2017 8:24

Interesting discussion, there are various topics in this thread so perhaps I can jump in and comment on two of them.

For both of these CBMPy (http://cbmpy.sf.net) and https://github.com/SystemsBioinformatics/cbmpy has full support for both the FBC and Groups packages and supports a key/value annotation for storing arbitrary annotations.

As far as the specification of a generic key/value annotation goes, Frank Bergmann and myself proposed an annotation for the FBC package in v1. It is designed to be simple enough to parse without the use of an XML parser. For more details please see Section 6.1.2 on page 20 of the specification: http://co.mbine.org/specifications/sbml.level-3.version-1.fbc.version-1.release-1.pdf

Midnighter commented 7 years ago

From @cdiener on July 14, 2017 16:53

Thanks for your comments @draeger, @mhucka and @bgoli! I think we have pretty much agreed to fully implement the groups feature as it is used by SBML (see #543). As for the the key/value annotations the suggestion mentioned by @bgoli would work for us I think, only that @ChristianLieven asked whether it would be possible to have an additional attribute "reference" for each data tag.

Some of those seem to be redundant with already existing SBML parts. For instance it was mentioned here that "subsystem" should be implemented as group and not as key/value pair. Also

<data id="gene_association" type="string" value="(b2947)"/> 

seems to be redundant with <fbc:listOfGeneAssociations> which is already supported by cobrapy. I think from our side we would prefer if there were only one allowed way to specify that information.

Midnighter commented 7 years ago

From @ChristianLieven on July 15, 2017 14:51

The following suggestion is what @cdiener is referring to above.

From #541:

It seems like the community hasn't decided yet what exactly the notes field should contain and how it should be formatted. Personally, I'd find most useful if there was a clever way of allowing both, short human-readable comment entries, as well as optional, but specifically related machine-readable DOI-styled literature references. In the model object, I suppose this could be a nested dictionary looking something like this: some_model.reaction.SOME_RXN.notes = {"confidence_score":{"value":4, "reference":"some_doi"}}

Based on the referenced publications [...], another useful key of the notes-field/attribute would be a simple 'comment' option, which would be limited in length (50 chars? 70 chars? 80 chars?).

some_model.reaction.some_metabolite.notes = {"comment":{"value":"Short string outlining a hypothesis or specific decision for this metabolite", "optional_reference":"some_doi"}}

From the linked publication by Heavner et al., I'd like to highlight 2 passages in this context as arguments for my suggestions.

Information about how these choices are made during the course of reconstruction can be very useful for informing subsequent research efforts, but such information is currently difficult to find because it is seldom included in published reconstructions

And, like the ambiguities and choices that are made and should be highlighted in assembling a reconstruction, highlighting the choices made in deriving a model provides further opportunity for scalable hypothesis generation

Midnighter commented 7 years ago

From @draeger on July 19, 2017 14:3

@ChristianLieven: what you propose here as a "comment" is exactly what the notes are intended to be in SBML. The only difference is that they are allowed to contain any valid XHTML and not restricted in length at all. So instead of trying to encode something in notes that is part of the computer-readable data structure, these should just be what you name a "comment".

Midnighter commented 7 years ago

From @ChristianLieven on July 21, 2017 7:52

Thank you @draeger for taking the time to explain.

what you propose here as a "comment" is exactly what the notes are intended to be in SBML.

I now understand that the proposed 'comment' field is the same as the 'notes' field, but I failed to highlight the important difference to be the ability to actively link a comment/note directly to a publication in the form of a DOI. So, like @cdiener noted, something like a 'reference' attribute:

<notes reference="DOI:10.1016/j.copbio.2014.12.010">
    <p>
    Based on the measurements for Enzyme XYZ by Author and Author et al. (XXXX), we have decided to constrain the lower bounds of RXN_ABC to X mmol g DW-1 h-1. A more lenghty explanation follows here.
    </p>
</notes>

I consider this important a) to be able to directly connect hypothesis and decisions to 'tangible' evidence both in a human and machine-readable form and b) to obtain a transparent, self-contained document.

I may just not be aware that this is already a possibility, as far as I know there is not yet a 'cobrapy way' to do this, and I thought it might be because SMBL didn't allow it.

Midnighter commented 7 years ago

From @draeger on July 21, 2017 8:35

You can do something like this using MIRIAM annotations. This gives you a method to specify an online resource (such as a publication identifier) and state the relationship between the model component and that online resource. For instance, you can say IS_DESCRIBED_BY and then add the resource http://identifiers.org/pubmed/25562137 which is exactly the publication you cited above. For more information, please see http://identifiers.org or http://www.ebi.ac.uk/miriam/main/collections/MIR:00000113

Midnighter commented 7 years ago

From @draeger on July 21, 2017 8:41

Here is an example how this could look like:

<parameter id="RXN_ABC_lb" ... metaid="RXN_ABC_lb">
  <annotation>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
      <rdf:Description rdf:about="#RXN_ABC_lb">
        <bqbiol:isDescribedBy>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/pubmed/25562137" />
          </rdf:Bag>
        </bqbiol:isDescribedBy>
      </rdf:Description>
    </rdf:RDF>
  </annotation>
</parameter>

There are, of course, many other relationships besides "is described by" that you may find more appropriate, see http://co.mbine.org/standards/qualifiers for details.

Midnighter commented 7 years ago

From @bgoli on July 21, 2017 14:3

@cdiener sounds good. Just a note, the KeyValueData structure has no restriction on the keys themselves, the example were just using existing COBRA files. I completely agree with you that software should automatically move GPR information from notes and store them in the FBC descriptions.

A more general description can be found here: http://pysces.sourceforge.net/KeyValueData except I am considering dropping the "type" attribute.

tpfau commented 7 years ago

@dreager @bgoli I would like to add a comment with respect to the matlab toolbox: As it stands, the only package supported by libsbml in the matlab bindings is the fbc package. Groups, layout etc are not supported, and the data is not provided in the returned struct. I was thinking of using jsbml as a direct link from matlab, but this doesn't work neither as Matlab ships a completely outdated java VM, so jsbml doesn't work. So at least for us on the matlab side groups and other non fbc-packages are currently not an option for IO.

draeger commented 7 years ago

@tpfau: thanks for letting me know about your problems. May I ask you a few questions about the configuration you tested?

  1. MATLAB version?
  2. Version of JSBML?
  3. Version of SBML Toolbox?

Just so to best reproduce the problems. I am forwarding this to the SBML team as well in order to best support your needs.

tpfau commented 7 years ago

@draeger: I was testing jsbml 1.2 (download from sourceforge with dependencies) with Matlab versions supported by the Matlab toolbox (2014b and 2016b). We are currently not using the SBMLToolbox at all but the direct Matlab bindings for libsbml On both matlab versions after importing the lib trying to read an SBML yields:

Java exception occurred:
java.lang.NoSuchMethodError: com.ctc.wstx.api.ReaderConfig.setMaxElementDepth(I)V
    at org.sbml.jsbml.xml.stax.SBMLReader.readXMLFromStream(SBMLReader.java:567)
    at org.sbml.jsbml.xml.stax.SBMLReader.readSBMLFromString(SBMLReader.java:870)
    at org.sbml.jsbml.xml.stax.SBMLReader.readSBMLFromString(SBMLReader.java:884)
    at org.sbml.jsbml.SBMLReader.readSBMLFromString(SBMLReader.java:267)
    at org.sbml.jsbml.SBMLReader.read(SBMLReader.java:143)

(Which is the typical issue of conflicting loaded additional jars).

skeating commented 7 years ago

@tpfau My recent revamp of the libsbml bindings for matlab makes it very straightforward to add package support. It will eventually (next release) be possible to add support by merely adding scripts rather than needing to wait for new binaries.

How urgently do you need support for groups/layout ?

tpfau commented 7 years ago

@skeating Not before a schema is actually finished, as anything we would code now would likely conflict in some way with a final schema. We will anyways have some issues when e.g. SubSystems/Pathways are starting to be defined in groups, as this would indicate

  1. One to many relationships between reactions and pathways, were we currently only allow one in the matlab toolbox (this will have to be changed on our part in the future).
  2. Potential assignment of Metabolites to pathways (which we currently don't do at all).
  3. Other annotations like compound or reaction classifications that might be encoded in groups, and which are currently not existent.

The same applies for other packages (e.g. layout, where we already had some internal discussions about how to handle them). So n general I would say: We would like to have the Matlab binding support as soon as a final schemata is settled on to be able to then incorporate that schema into the COBRA Toolbox.

skeating commented 7 years ago

@tpfau So if our next release (Nov) added the support for groups and layout would this be soon enough ?

tpfau commented 7 years ago

I think so. @rmtfleming Do we currently need anything besides layout and groups?

Midnighter commented 7 years ago

I thought groups were already well defined within the standard now?

tpfau commented 7 years ago

@Midnighter Yes, groups are defined in the SBML definitions but (up until now) at least in the Matlab Toolbox they are not (yet) used due to them lacking from the Matlab bindings, so an output from cobrapy with groups would have data that is ignored.
I'm not really familiar with cobrapy, so I simply don't know how it handles groups (or other packages), and I'm hoping that we can in the future use SBML to exchange models without problem.

draeger commented 7 years ago

@luciansmith mentioned that the distrib package for SBML could be useful for storing confidence intervals (see https://github.com/opencobra/schema/issues/4#issuecomment-326449722).

Midnighter commented 7 years ago

We're derailing the original issue quite a bit here. Should we open a gitter chatroom for this repository for general discussion?

@luciansmith mentioned that the distrib package for SBML could be useful for storing confidence intervals (see #4 (comment)).

Please note that the original query was about a format for storing reaction confidence scores not confidence intervals. Confidence scores being numeric indicators of the likelihood that a reaction is actually present in an organism (and associated with literature references).

draeger commented 7 years ago

I know, these are typically 4 discrete numbers. It was also mentioned in comment #4 that parameter objects can in principle be used for this as well. Finally, an annotation could be the last solution if nothing else is appropriate.

luciansmith commented 7 years ago

Thanks for the clarification about confidence scores!

The distrib package as it currently stands has a bunch of pre-defined distribution and certainty measures that can be applied to any SBML element, and a second mechanism whereby any other measure with a definition URI can also be added. Currently, confidence intervals are in the 'pre-defined' list, but confidence scores could be added via the 'definition URI' method.

However, since distrib isn't yet quite finalized, adding new measures to the pre-defined list is simple, so if there was interest (as there seems to be), I could just add it.

This is indeed more afield from the original discussion; I'd be happy to move to a glitter chatroom or a newly-opened issue; just point me where to go.

Midnighter commented 7 years ago

https://gitter.im/opencobra/schema

bgoli commented 7 years ago

Just throwing this out there. Has anyone considered extending/creating a resource for this? One could use the evidence type rather than the score itself, coupled with the existing MIRIAM annotation mechanism with no change in the XML, Purely as an example:

http://identifiers.org/bigg.confidence/biochemical http://identifiers.org/bigg.confidence/genetic http://identifiers.org/bigg.confidence/none etc.