Implement ontology usability scores

caufieldjh commented 5 months ago

As per SAB comments and discussion on Dec 11 2023, we would like to:

...keep the strengths of both (i) the widely non-restrictive and flexible approach of BioPortal and (ii) the more guided and governed approach of the OBO Foundry

by

...Keep[ing] BioPortal open, but maybe try to score or sort ontologies somewhat...to increase value to users.

What metrics does BP already provide?

Popularity
size in classes and instances
dates of first and last upload
Counts of mappings to other ontologies (e.g. https://bioportal.bioontology.org/ontologies/GO/?p=mappings)

Others?

What metrics does OBO Foundry provide?

See the OBO Foundry dashboard..

In brief, and also summarized here, these are based on 20 principles:

Open - The ontology MUST be openly available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed in altered form under the original name or with the same identifiers.
Common Format - The ontology is made available in a common formal language in an accepted concrete syntax.
URI/Identifier Space - Each ontology MUST have a unique IRI in the form of an OBO Foundry permanent URL (PURL).
Versioning - The ontology provider has documented procedures for versioning the ontology, and different versions of ontology are marked, stored, and officially released.
Scope - The scope of an ontology is the extent of the domain or subject matter it intends to cover. The ontology must have a clearly specified scope and content that adheres to that scope.
Textual Definitions - The ontology has textual definitions for the majority of its classes and for top level terms in particular.
Relations - Relations should be reused from the Relations Ontology (RO).
Documentation - The owners of the ontology should strive to provide as much documentation as possible.
Documented Plurality of Users - The ontology developers should document that the ontology is used by multiple independent people or organizations.
Commitment To Collaboration - OBO Foundry ontology development, in common with many other standards-oriented scientific activities, should be carried out in a collaborative fashion.
Locus of Authority - There should be a person who is responsible for communications between the community and the ontology developers, for communicating with the Foundry on all Foundry-related matters, for mediating discussions involving maintenance in the light of scientific advance, and for ensuring that all user feedback is addressed.
Naming Conventions - The names (primary labels) for elements (classes, properties, etc.) in an ontology must be intelligible to scientists and amenable to natural language processing. Primary labels should be unique among OBO Library ontologies.
Notification of Changes
(or on that document, principle 16) Maintenance - The ontology needs to reflect changes in scientific consensus to remain accurate over time.
(or on that document, principle 20) Responsiveness - Ontology developers MUST offer channels for community participation and SHOULD be responsive to requests.

What additional metrics would improve BP's usability on a per-ontology basis?

We clearly don't need the full battery of metrics described above, and in some cases (like relation types in item 7) they may not even be good fits for the project. A better determination of users (as in item 9) may be helpful, if only in a simplified "ontology A imports ontology B" view.

caufieldjh commented 5 months ago

See also: workings of the NCBO Ontology Recommender 2.0 - https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0128-y

jonquet commented 5 months ago

A possible approach would be to look at O'FAIRe: https://github.com/agroportal/fairness This is already available in 6 OntoPortal instance. But this require significant addition in the metamodel as the often discussed with the BioPortal team.

Whatever tool or approach to score or sort ontologies somehow will actually need more metadata about ontologies. And these metadata would need to be filled in and curated... as its done in the OBO Foundry for the dashboard to work.

caufieldjh commented 5 months ago

Immediately implementable usability metrics, beyond those already in use:

Binned update times (in last month, last three months, last year, last five years)
Definition/label quality (e.g., Classes with no definition as fraction of all classes)
Is the version known/available? This can be a mess even for OBO ontologies - maybe KG-OBO can help
Language - this should probably just be standard metadata, but should be quick to add

Requires some manual curation:

Is there an active contact?
O'FAIRe, as above
Is a license provided? This is sometimes in the description, e.g., https://bioportal.bioontology.org/ontologies/AFO

jvendetti commented 4 months ago

@jonquet - I was looking at some of the documentation for O'Faire:

We implemented O’FAIRe as a Web service working with any OntoPortal installations respecting the Metadata for Ontology Description and Publication Ontology (MOD1.4) metadata profile to harmonize metadata.

I would like to better understand what this statement means in terms of BioPortal's ability to use this software. Does this mean there is a strict requirement to add all metadata properties from the MOD1.4 standard? Or just a subset?

This is already available in 6 OntoPortal instance. But this require significant addition in the metamodel as the often discussed with the BioPortal team.

I assume the 6 instances you refer to here have been able to use O'Faire due to a wholesale adoption of the AgroPortal codebase (at the REST API level)? Internally we've discussed an incremental approach to adopting more metadata, and I'm not certain if that precludes usage of O'Faire.

syphax-bouazzouni commented 4 months ago

Hello @jvendetti,

I can provide some insights on O'FAIRE while awaiting @jonquet's response. (Apologies if you are already familiar with the context; you can skip to "How to implement it," which directly addresses your question.)

Context

O'FAIRE is a fairness assessment tool designed to assign a FAIR score (Findable, Accessible, Interoperable, and Reusable) to resources (Ontologies). The higher the score, the better. This score is calculated based on the number of FAIR principles that an ontology asserts. See the full FAIR principles here

How it Works

To establish a measurable metric, we have devised a methodology that defines a set of questions, each corresponding to a principle. These questions evaluate various metrics and return a score. You can see the full list of questions here.

Unlike some other tools in this field, such as foops, which calculate metrics live upon submission of a resource, O'FAIRE operates differently. Instead of extracting metrics directly from submitted ontologies, we utilize metadata already parsed by Ontoportal. This approach allows us to recalculate the FAIR score for each submission or update, storing the result for quicker access. O'FAIRE consumes 123 metadata properties from AgroPortal, 62 originally from BioPortal, and additional properties introduced since 2016-2018. You can find the complete list of properties used by the tool here.

How to Implement it

O'FAIRE is implemented as a microservice (JSON API) developed in Java and running on a Tomcat servlet. You can access the source code here.

As mentioned earlier, O'FAIRE relies on metadata. Providing more metadata leads to a better score, while a lack of metadata results in a score of 0 for the corresponding test, ultimately yielding a lower overall score.

This means that O'FAIRE already works for any Ontoportal instance, including BioPortal, by default. For example, to obtain the FAIR score of the ontology AFO from BioPortal, you can make the following API call: https://services.agroportal.lirmm.fr/ofaire?url=https://data.bioontology.org&ontologies=AFO&apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb (this key is the public BioPortal apikey). This call returns a result of 185 (38%) along with detailed information on each principle and test conducted.

If you wish to integrate O'FAIRE into BioPortal, you'll need to configure and build the .war file from the sources, which you will then deploy on your Tomcat server. Once deployed and functioning on the UI side, you will just make HTTP calls to consume its JSON response. Refer to the readme for detailed instructions.

syphax-bouazzouni commented 4 months ago

Internally we've discussed an incremental approach to adopting more metadata, and I'm not certain if that precludes usage of O'Faire.

Regarding this subject of Metadata, If you want you can (@jvendetti) open another issue in the project ontologies_linked_data, where I can give you more technical details on that, as the exact list of properties added to the model, how we did it, the challenges, how to extract them automatically from the submissions, data validations,...

jonquet commented 4 months ago

Just a quick note while I am away: O'FAIRe already technically works with BioPortal (see exemple in https://hal.science/lirmm-03630233/) but without the metadata returned by the portal many questions stay without scores.

ncbo / bioportal-project