microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

NMDC guidelines for usage of available ontologies #1848

Open anastasiyaprymolenna opened 6 months ago

anastasiyaprymolenna commented 6 months ago

What are the requirements for an ontology to be an approved source of reference for the NMDC? Does there have to be a certain amount of community acceptance? Or certain communities that will use this ontology? Is there a style guide to follow?

Currently there are terms for enzymes (https://github.com/microbiomedata/berkeley-schema-fy24/pull/97) like 'alphap' , and substance roles (https://github.com/microbiomedata/berkeley-schema-fy24/pull/100) (like derivatization agent, stabilizer, precipitating agent, etc.) that cannot be found from only using the ontologies available in prefixes: on the nmdc.yaml

I am curious to how we expand the list of acceptable ontologies in a methodical manner. Would other ontologies listed on the OLS be acceptable candidates? What about ontologies like ChemFOnt that are published but are still being taken up by databases to be used (currently only implemented in HMDB)? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825615/

@brynnz22 @turbomam @cmungall

turbomam commented 5 months ago

This is an excellent question and I'm sorry that I haven't provided better support for it yet. I appreciate this effort to move forward.

I can't think of any elevator answer to the question, so I will adding more comments and links over the next couple of hours.

For now, I'd like to start with a question

Q: If we find a good way to indicate whether a particular ontology is suitable or not, what will that enable?

I'm going to hypothesize the answer is "once we know which ontologies to use, picking the terms will be easy." I think that may be true for some homogeneous groups of terms (like enzyme or chemicals). I also think there will be cases in which knowing a preferred or suitable ontology won't be enough and that term searching and evaluation skills will be required.

turbomam commented 5 months ago

One solution would be to use the nmdc-ontology as the one approved source of terms. If we did that, and a data submitter didn't find the term they want, we would go though a structured importing or authoring process. That wouldn't eliminate the need for identifying appropriate source ontologies, but it would make the process less ad-hoc.

If we take this approach, we will need to provide better tools and training for accessing the nmdc-ontology. Right now it can be accessed

The file checked into GItHub is always most up to date. Those other two sources could be a few days or weeks old. We can add some automation to the processes that update them.

We could also create an NMDC Ontology Lookup Service.

We would need to provide support for searching though any of those resources.

turbomam commented 5 months ago

The OBO dashboard has three pages that are helpful in thinking about ontology quality and suitability. The Dashboard and Dashboard analysis only consider ontologies that are already part of the OBO foundry, but the Foundry principles are applicable to any ontology.

The OBO Foundry also has a Tools and Resources page. Towards the top is a link to the OBOOK, Open Biological and Biomedical Ontologies Organized Knowledge. I haven't identified which sections might be most useful for picking source ontologies yet.

turbomam commented 5 months ago

I have downloaded ChemFOnt via https://www.chemfont.ca/system/downloads/1.0/chemfont.owl.zip but

Protege error (when attempting to open ChemFOnt as RDF/XML, which it clearly is):

org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParserException: [line=353688:column=110] IRI 'http://purl.obolibrary.org/obo/3-Hydroxy-2-methyl-[S-(R,R)]-butanoic acid' cannot be resolved against current base IRI http://pur.obolibrary.org/obo/Merged.owl reason is: Illegal character in path at index 50: http://purl.obolibrary.org/obo/3-Hydroxy-2-methyl-[S-(R,R)]-butanoic%20acid

@cmungall how do you feel about potentially using ChemFOnt?

turbomam commented 5 months ago

Chemical Functional Ontology (ChemFOnt): unctions and actions of >341 000 biologically important chemicals

turbomam commented 5 months ago

I sent a web-form email and a tweet (as they suggest) to the Wishart lab, mentioning the Protege/OWLAPI error above and asking if they have a GitHub repo.