obiba / opal

OBiBa’s core database application for biobanks or epidemiological studies.
http://www.obiba.org/pages/products/opal/
GNU General Public License v3.0
29 stars 22 forks source link

Feature request: Better support for taxonomies and added support for ontologies through an OWL files import integration and a taxonomy/ontology-specific quick search feature #3689

Closed feel-ix closed 1 week ago

feel-ix commented 2 years ago

Is your feature request related to a problem? Please describe. Opal already supports the import of taxonomies as YAML (.yml) files. It is thus possible to import taxonomies from GitHub in a very user-friendly way. However, very few taxonomies are available in YAML format, either on GitHub or elsewhere.

Also, Opal does not currently support ontologies. The recognized format for ontologies is OWL files (a standard recognized by the W3C for Web Ontology Language). Thus, the majority of known ontology repositories (BioPortal, Ontobee, OBO Foundry, etc.) offer ontologies in OWL format. Opal does not currently support the import of OWL files.

As some of the most well-known ontologies contain upwards of 50,000-100,000 concepts, it is not realistic to manually convert the OWL files of these ontologies into YAML files that would comply with the structure required by Opal.

Describe the solution you'd like We think it would be interesting if Opal could accommodate formats other than YAML and allow the import of OWL files. We think that making it possible to import OWL files would “kill two birds with one stone”: enable Opal to support ontologies, and enable Opal to support a wider variety of taxonomies (since some taxonomies are available in OWL format, but not in YAML).

However, OWL files (and ontologies) have very varied and complex multi-level architectures, and relationships between concepts that are not limited to parent-child relationships. At the moment, the Opal taxonomy module supports taxonomies based on a two-tier hierarchy: Vocabularies | Terms (and according to parent-child relationships).

In order to take into account this variable complexity of OWLs during the import, and to keep the Opal taxonomy module usable and user-friendly, it would be interesting to take advantage of the quick-search feature in “Apply annotation” (and to adapt it to ontologies). The quick-search feature could be modified to allow searching in added ontologies when annotating variables, but without having to take into account the complex architecture of the relationships between concepts.

In summary, the proposed feature is as follows:

  1. Enable the import of OWL files through the Administration/Taxonomies/Add taxonomy menu in Opal. This feature would only import the names of the concepts contained in the OWLs and the definitions/descriptions of those concepts (if applicable). This feature would not import the relationships between concepts, or any other data. This import could use the same structure as what is already in place in Opal for taxonomies (i.e.: ontology concept names could be considered as Opal vocabulary titles, and ontology definitions/descriptions of these concepts could be considered as Opal vocabulary descriptions. Thus, the second level (terms titles / terms descriptions) could be left blank when importing an ontology).

  2. Create a second search feature in "Apply annotation" that would be specific to ontologies and would allow annotating variables using the concepts found in the imported ontologies (imported OWL files). This feature would allow to type a query (e.g.: “obesity”) and would display as suggestions a list of all the concepts containing the word obesity (a bit like the current quick search feature of Opal).

  3. Add an option to tell Opal in which imported ontology to perform the search. As mentioned, some ontologies contain in excess of 100,000 concepts. It would therefore be preferable to be able to select the ontology in which to make the query BEFORE performing the quick search through all the concepts.

  4. A “nice to have” would be to make it so that this ontology-specific search feature can also be added to the current taxonomy quick-search feature (in “Apply annotation”). Taxonomies too can have multiple concepts, so it would be nice to be able to specify to Opal in which taxonomy to look before doing the search.

Describe alternatives you've considered We have not considered other alternatives. We understand that OWL files are very complex and varied. Therefore, we believe that using a search feature is the best approach to allow Opal to support them more easily and to support a larger number of them. Since researchers and research projects use a lot of different ontologies (depending on their research domain and interest), we think it is important that Opal can easily accommodate a wide variety of ontologies. We believe that an interesting way to make this possible is to make sure that only concepts and concept descriptions/definitions are imported when adding OWL files (and to leave the relationships between concepts aside).

Additional context I represent a research and innovation platform in sustainable health called PULSAR which is based at Université Laval (Quebec, Canada).

We have needs regarding the integration/addition of ontologies to Opal and Mica, and we may be able to contribute to some extent to what we propose above.

ymarcon commented 2 years ago

Thanks for the proposal. We have considered many years ago embracing ontologies, but as you pointed out, it brings a lot of concepts and complexity, much more than what our end-users needed.

  1. In your proposal, OWL is an alternative format for expressing a Opal taxonomy. You are mentioning that in the conversion process, vocabulary terms "could be left blank": what would be the use of a vocabulary without terms? Currently applying an "annotation" consists of applying a vocabulary term to a variable.

  2. Note that in theory a term can have sub-terms: there is no limit for the number of levels, but practically we decided to have only one level of terms, because it simplifies a lot the search UI in Mica.

  3. I agree with any improvements that could be done in Opal for managing the large taxonomies, but Mica is the application that makes use of the Opal taxonomies: how would you handle these taxonomies in the Mica's search page for instance?

We do not have the resources for working on that topic, unless you provide funding. We are open to collaboration of course and would review any PR your team would submit.

feel-ix commented 2 years ago

Hello,

Thank you for your reply. Here are our answers to your questions:

  1. Answer: As you know, not all ontologies/taxonomies have parent-child relationships. Some have relationships between concepts that are simpler, and some have relationships between concepts that are much more complex and have massive hierarchies and numerous subdivisions. Our proposition is to take advantage of a quick-search feature in order to simplify the integration and usability of large taxonomies/ontologies in Opal. Since it would be very complex to integrate different types of ontologies and taxonomies into Opal, our hunch is that it would be simpler to create a taxonomy/ontology import feature that would only import the full list of concepts/terms of any given taxonomy/ontology (ignoring their hierarchy/structure/relationships between concepts and terms). In our opinion, what is really important is to be able to annotate variables with the right concepts/terms. Therefore, applying an “annotation” to a variable could simply consist of adding a vocabulary, without adding a vocabulary term, or simply adding a “vocabulary term” without adding a vocabulary. The goal is simply to have a one-tier structure, instead of a two-tier structure in order to simplify the integration/import/conversion of various taxonomies/ontologies available in OWL format. Obviously, our proposal is not to eliminate your two-tiered hierarchy. Perhaps the UI could reflect this difference and keep your current two-tier hierarchy (vocabulary | term) for YML files, and use the proposed one-tier hierarchy for imported taxonomies/ontologies in OWL format. Alternatively, perhaps the new feature could be in another section altogether (“Apply annotation” = YMLs or manually added taxonomies through Opal | “Apply ontology” = imported and converted OWLs). There is certainly a reflection to have on this in order to keep an intuitive interface and UX. Please see the attached image to see a mock-up of what we have in mind. As mentioned in our initial proposal, the quick-search feature could filter by taxonomies/ontologies (so that only results related to the selected taxonomy/ontology are displayed).

mockup_opal_2022_03_11

  1. Answer: Ok. Thanks for the information, we didn't know that!

  2. Answer: This is an excellent question that in itself would require some careful consideration at the UI/UX level, given the fact that some ontologies contain hundreds of thousands of concepts/terms. I can't tell you how it would be handled on the coding side, but I can show you what I would have in mind on the UI/UX side. Below is a second mock-up of what large ontologies/taxonomies could look like in the Mica search page. A 2-step filtering feature, in which we would first select the imported ontology/taxonomy of our choice (from Opal), and then we could search for terms within this taxonomy/ontology and select the ones we want to lookup for (using check boxes, or another type of selector) would be interesting. Contrary to the mock-up, it would certainly be ideal to be able to add more than 4 search terms per query. Of course, this feature has limitations (i.e.: only be able to perform a search on one taxonomy/ontology at a time) but in our opinion, it would still be interesting and usable.

mockup_mica_2022_03_11

Thank you for your feedback and consideration. We don't have a specific budget for this feature either at the moment. However, this may change in the future. We will contact you if this is the case.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for a year with no activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 week ago

This issue was closed because it has been inactive for 14 days since being marked as stale. Feel free to re-open it if it is still relevant.