sys-bio / vscode-antimony

Extensions for Antimony development in Visual Studio Code.
https://marketplace.visualstudio.com/items?itemName=stevem.vscode-antimony
MIT License
1 stars 0 forks source link

Database recommendations based on OMEX metadata specification #27

Closed mastevb closed 2 years ago

mastevb commented 2 years ago

Currently, creating annotations rely on a third-party package for retrieving annotation information from several sources. There are two problems with this approach:

  1. The source of the annotation is limited by the third-party package, and
  2. The performance is not good.

One idea is to transform all of the data to our own cloud-based database and query from there, but since this will require some sort of funding, we need some data to back our claim:

  1. How much faster will the process be if we host our own DB?
  2. How large is the information? If the size is small, maybe we can include it in the extension package and read it from there?
  3. What if we create indexes?
  4. Copy right issues?

and also https://github.com/sys-bio/vscode-antimony/issues/48

mastevb commented 2 years ago

Added delay for processing so the user can finish typing and not fire up multiple search queries.

implemented in: https://github.com/sys-bio/vscode-antimony/commit/dee492ac73e22b247ef5e7739cfbc2589d20bb69

mastevb commented 2 years ago

Added progress report bar and notification for no results found.

implemented in https://github.com/sys-bio/vscode-antimony/commit/c378a0a8828b09ec050303e72b66839657824bba

mastevb commented 2 years ago
bulk insert chebis from 'chebi.csv'
with (ROWTERMINATOR = '0x0a', FIELDTERMINATOR = '~QwQ~', FIELDQUOTE = '!',
DATA_SOURCE = 'blob2', FORMAT='CSV', CODEPAGE = 65001, --UTF-8 encoding
FIRSTROW=1,TABLOCK);

CREATE INDEX index1 ON [dbo].[chebis] (name);
import xmltodict
import csv 

contents = open("chebi.xml").read()
ch = xmltodict.parse(contents)
chebis = ch["rdf:RDF"]["owl:Class"]

f = open("chebi.csv", "w")

for chebi in chebis:
    if "rdfs:label" in chebi.keys() and "oboI:id" in chebi.keys():
        text = chebi["rdfs:label"]['#text']
        id = chebi["oboI:id"]['#text']
        f.write("!{}!~QwQ~!{}!\n".format(text, id))
mastevb commented 2 years ago

Also need to improve the mapping between types and variables and the database to choose from

For SBML compartments, support searching Gene Ontology:cellular component Cell Type Ontology Foundational Model of Anatomy Mouse Adult Gross Anatomy Ontology for Biomedical Investigations

For SBML species, support searching ChEBI Protein Ontology UniProt

For SBML reactions, support searching GO:biological process RHEA

mastevb commented 2 years ago

Hi Steve,

Here's a link to the OMEX metadata specification that I mentioned today: https://doi.org/10.1515/jib-2021-0020

If you jump to section 2.4.2 (Resources to use for composite semantic annotations) there is a list of recommended ontologies and databases there.

Working from that list, I would recommend you support searches for the following SBML components like so:

For SBML compartments, support searching Gene Ontology:cellular component Cell Type Ontology Foundational Model of Anatomy Mouse Adult Gross Anatomy Ontology for Biomedical Investigations

For SBML species, support searching ChEBI Protein Ontology UniProt

For SBML reactions, support searching GO:biological process RHEA

This recommendation is based on what's in the OMEX metadata specification, which was created with a particular user community in mind. Your target users might have other ontologies or databases that they want to be able to search. For example, some people like to use KEGG, but it's proprietary and so isn't included in the recommended resources in the OMEX metadata spec. Just something to keep in mind.

Hope this is helpful.

M