pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

Allow multiple ontologies to be used for one annotation type #2730

Open jseager7 opened 1 year ago

jseager7 commented 1 year ago

PHI-Canto has a 'Disease name' annotation type that currently uses terms from a controlled vocabulary (PHIDO) that PHI-base developed themselves, by copying disease names out of the PHI-base database.

However, many of the terms in PHIDO duplicate the semantics of terms from existing disease ontologies (particularly MONDO), so we should really be reusing those existing terms.

The problem is, we can't just use MONDO because it's missing terms for many diseases in PHI-base.

For human and animal diseases, we can probably just request new terms from MONDO (or from its contributing source ontologies), but plant disease seems to have no representation at all and may be out of scope for MONDO anyway.

The only other ontology related to plant disease that we can find is the Plant Stress Ontology (PSO), but that only has a very general hierarchy of plant diseases (e.g. 'disease of spinach') which aren't even included in the OBO version of the ontology. PSO has seen no further updates since May 2020.

Basically, we'll probably need to use a combination of terms from MONDO and PHIDO for the foreseeable future (maybe plus some extra disease ontologies in future to cover other terms).

The problem is, Canto only seems to filter its ontology terms based on the OBO namespace, so presumably the only way to add MONDO terms to the Disease name annotation type is to inject the 'phido' namespace property into every MONDO term that we want to include.

  - name: disease_name
    category: ontology
    namespace: phido

I don't think there's anything in the MONDO license (CC BY 4.0) that precludes modifying the terms, and the change will never be exposed to end users since Canto only exports the term ID. Still, this isn't really an elegant solution.

@kimrutherford Can you think of any better solutions to this that wouldn't be prohibitively difficult? Also, is my proposed solution actually feasible?

kimrutherford commented 1 year ago

Hi James.

I think your suggestion adding MONDO terms to your OBO file, then adding namespace: phido to them to would work. It's a very reasonable plan. I've seen that sort of thing in other OBO files. I don't know how to do that in OWL.

Another option is to have a PHIDO term for each MONDO term that you need to use, with an xref: containing the original MONDO ID. I think that's also a reasonable option and gives you more control that using MONDO terms directly and it means all the disease annotations use the same ontology. But that's more work than just adding namespace: phido to a copy-and-pasted MONDO term.

A third option (which I can't guarantee will work) is to change your annotation_type configuration to something like:

  - name: disease_name
    category: ontology
    namespace: "[PHIDO:0000001|MONDO:0000001]"
    ...

The advantage of this plan is that you will (hopefully) allow you to select any term from PHIDO or MONDO, rather than having to add the term to the PHIDO owl or obo file before it can be used. The downside is that you'll need to load all of MONDO in Canto.

jseager7 commented 1 year ago

Thanks for the advice. I didn't know about the third option with the alternative namespace syntax, so I'll try that to see if it works.

I think loading MONDO would be the best solution since we don't want to constrain which terms curators can use, beyond omitting non-infectious diseases like genetic diseases. Maintaining a list of imported terms will be too much effort. The MONDO OWL file is big (171 MB), but its OBO version is similar in size to go-basic.obo, so shouldn't pose any problems with loading.

I've seen that sort of [namespace] thing in other OBO files. I don't know how to do that in OWL.

The namespace properties are embedded into OWL using oboInOwl annotation properties, so it's pretty straightforward. Using PHIPO as an example:

<oboInOwl:hasOBONamespace>single_species_phenotype</oboInOwl:hasOBONamespace>

Fortunately, the MONDO OBO file doesn't have any namespace properties on terms already, so it should be easy enough to add them if that's required.

Another option is to have a PHIDO term for each MONDO term that you need to use, with an xref: containing the original MONDO ID.

We did think about this, but I suspect most people in the ontology world would prefer us to reuse existing terms if the semantics are identical. Given that it's more work to add cross references than to add terms from MONDO, I'd prefer term reuse as the first option.

jseager7 commented 1 year ago

@kimrutherford I've tested the alternative namespace syntax, and unfortunately the terms from MONDO are not suggested, despite being loaded in the database. Only terms from PHIDO appear.

Using 'chronic erythremia' (MONDO:0001394) as an example, here's what appears in the autocomplete:

image

The term does appear if the term ID is entered instead:

image

Looking at the source code, I can't see any special parsing being applied to the namespace property to make the pipe-separated syntax work as intended. But if the namespace isn't being parsed correctly, why am I still seeing terms for PHIDO?

Here's the configuration for the disease_name annotation:

  - name: disease_name
    category: ontology
    namespace: "[PHIDO:0000001|MONDO:0000001]"
    very_short_display_name: 'disease'
    short_display_name: 'disease'
    display_name: 'disease name'
    synonyms_to_display:
      - exact
    feature_type: 'metagenotype'
    can_have_conditions: 0
    broad_term_suggestions: ""
    specific_term_examples: ""
    help_text:
      Annotate the expected disease for this pathogen-host interaction, where 'expected' refers to the case of a pathogen strain interacting with a host strain with no experimentally-induced mutations, and where the interaction between the pathogen and the host is compatible (i.e. the interaction results in disease).
    more_help_text: ~
    extra_help_text: ~
    detailed_help_path: /docs/disease_annotation
kimrutherford commented 1 year ago

Hi James.

Sorry, you're right. I checked the code and although that funky syntax for namespace is parsed correctly, the implementation isn't there. Fixing it will require a bit of refactoring, but might be possible. I'll have a think.

I'm away for 2 weeks from tomorrow but I'll see what's possible when I get back.

But if the namespace isn't being parsed correctly, why am I still seeing terms for PHIDO?

I think that was probably just luck. Due to the weird way things are implemented, if PHIDO:0000001 is mentioned in the annotation extension configuration, the autocomplete will work for terms that are children of it. If MONDO:0000001 was mentioned in the config, the funky namespace config would probably have work fine.

jseager7 commented 1 year ago

I'm away for 2 weeks from tomorrow but I'll see what's possible when I get back.

Alright, thanks. In the meantime, I'll try to test the alternative solution of adding namespace properties on terms in MONDO.

kimrutherford commented 1 year ago

I'm away for 2 weeks from tomorrow but I'll see what's possible when I get back.

Hi James.

Sorry it took a while to get back to you. I had another look and due to my poorly structured code, a proper fix is going to take too long at the moment.

There might be a (very) hacky solution though. If you add this to your configuration and reload the ontologies, the namespace trick ("[PHIDO:0000001|MONDO:0000001]") should work:

ontology_namespace_config:
   interesting_parent_ids:
     - "PHIDO:0000001"
     - "MONDO:0000001"
jseager7 commented 1 year ago

a proper fix is going to take too long at the moment.

That's no problem. I haven't had time to try any other solutions to this problem yet anyway.

I'll try the hacky solution now.

jseager7 commented 1 year ago

@kimrutherford The interesting_parent_ids solution works as expected. I can see terms from PHIDO and MONDO in the same autocomplete now. Thanks again for looking into this.

image

kimrutherford commented 1 year ago

The interesting_parent_ids solution works as expected.

Great news! I wasn't 100% sure it would work.

Let's leave this issue open because I'd like to fix this properly one day.