Open ChristianLieven opened 5 years ago
Should those be three separate discussions?
The taxon can be readily encoded via a model annotation using the hasTaxon
biological model qualifier. Also tissues and cell types can be easily encoded using is
in combination with tissue ontologies like BTO or OMIT. See below an example on how I handle such information (in combination with provenance) via an SBML annotation on the model
element.
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#meta_caffeine_pkpd_v13">
<dcterms:creator>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<vCard:N rdf:parseType="Resource">
<vCard:Family>Koenig</vCard:Family>
<vCard:Given>Matthias</vCard:Given>
</vCard:N>
<vCard:EMAIL>koenigmx@hu-berlin.de</vCard:EMAIL>
<vCard:ORG rdf:parseType="Resource">
<vCard:Orgname>Humboldt-University Berlin, Institute for Theoretical Biology</vCard:Orgname>
</vCard:ORG>
</rdf:li>
</rdf:Bag>
</dcterms:creator>
<dcterms:created rdf:parseType="Resource">
<dcterms:W3CDTF>2018-04-19T16:05:32Z</dcterms:W3CDTF>
</dcterms:created>
<dcterms:modified rdf:parseType="Resource">
<dcterms:W3CDTF>2018-04-19T16:05:32Z</dcterms:W3CDTF>
</dcterms:modified>
<bqbiol:hasTaxon>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/taxonomy/9606"/>
</rdf:Bag>
</bqbiol:hasTaxon>
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/bto/BTO:0001489"/>
<rdf:li rdf:resource="http://identifiers.org/omit/0003300"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
Also sequence information can be referenced via an annotation (as long as it is located in an external database. There is a biomodel qualifier for that (http://co.mbine.org/standards/qualifiers). isEncodedBy, encoder
The biological entity represented by the model element is encoded, directly or transitively, by the subject of the referenced resource (biological entity B). This relation may be used to express, for example, that a protein is encoded by a specific DNA sequence.
One could combine this with the information located in external files and use the rdf:resource to reference the information (the rdf:resource is not limited to identifiers.org links but can also reference infromation in files located with the SBML file). All files should be combined in a combine archive for simple exchange:
<bqbiol:isEncodedBy>
<rdf:Bag>
<rdf:li rdf:resource="./gene_sequence.xml/sequence1234>
</rdf:Bag>
</bqbiol:isEncodedBy>
If you want to encode the sequence directly in the SBML file a good solution would be an annotation. In the SBML-fbc-v3 draft we made a proposal for a general purpose KeyValuePair annotation (working like an advanced python dictionary), which could work for such recurring information like gene sequences or protein sequences. This would allow for easy parsable key:value data in annotations. Same mechanism could apply for phenotypical data.
As @matthiaskoenig points out, taxonomy is no problem. All models in the BiGG database should make use of that feature.
A more natural way of encoding sequence information is using the specialized format SBoL (Synthetic Biology Open Language). Like @matthiaskoenig mentioned, it is possible to address external files. When we worked with @zakandrewking and the team from UC San Diego to propose how ME models can be encoded in SBML (which require the explicit inclusion of sequence information), we used a similar approach. The main idea was to not only ship an SBML file but a COMBINE archive comprising SBML, SBoL, and further files as needed (for instance, the model in JSON or MAT format, etc., SBGN-ML files, or SED-ML files for execution).
Please have a look at the SBMLme project. There you can also find an Example COMBINE Archive For the first version, we suggested a customized annotation, for example
<annotation>
<sbmlme:meSpeciesPlugin sequence="http://cobramens.url/sbol/RNA_b0001" genomePosition="2042572" />
</annotation>
so that we could also store the position within the sequence where the relevant information started. This was a requirement of the ME model (encoded as JSON file).
Problem description
I've copied this conversation from our response google doc during the review phase of memote. I'm curious to hear what the community here could add to this. @draeger and @matthiaskoenig, your seasoned input is much appreciated.