monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Create ORDO subset ROBOT template: Update goal to depend on component #511

Closed twhetzel closed 6 months ago

twhetzel commented 7 months ago

Based on the work to update the subsets in Mondo based on information from ORDO, this question about whether to update the URL for the component-download came up. See https://github.com/monarch-initiative/mondo-ingest/pull/510/files#r1588599701

Response from Nico:

Look at

$(COMPONENTSDIR)/ordo.owl: $(TMPDIR)/ordo_relevant_signature.txt config/properties.txt | component-download-ordo.owl
    if [ $(COMP) = true ]; then $(ROBOT) remove -i $(TMPDIR)/component-download-ordo.owl.owl --select imports \
        merge ...

It is better to:

depend in component-download-ordo.owl And use $(TMPDIR)/component-download-ordo.owl.owl to ensure we are using the same version as the rest of the pipeline

joeflack4 commented 7 months ago

Yeah, I just saw that comment. I'm a bit confused by Nico's response.

My thought is that, as I suggested, all that needs to be updated is mondo-ingest-odk.yaml, and just update the URL in this section...

components:
  products:
    ...
    - filename: ordo.owl
      source: URL

...replacing URL with the one we used for #510: https://www.orphadata.com/data/ontologies/ordo/last_version/ORDO_en_4.4.owl

If we do this, then we would run sh run.sh make update_repo, and this will update the Makefile such that the component-download-ordo.owl will now use this updated URL.

matentzn commented 6 months ago

My thought is that, as I suggested, all that needs to be updated is mondo-ingest-odk.yaml, and just update the URL in this section...

The config already has the correct URL:

https://github.com/monarch-initiative/mondo-ingest/blob/e093456bdb193f609b71e3ba5ce7baab3834fcd5/src/ontology/mondo-ingest-odk.yaml#L43

It points to the latest version of raw ORDO, not to the versioned PURL you are currently using.

Is there any other objection to my suggested plan (just depending in the existing mirror instead of creating a new one)?

joeflack4 commented 6 months ago

@matentzn Ah ok, I think the source of my confusion here is this:

Trish wrote:

(1) is this the location you want for the ORDO file (this is the value from OLS for the Ontology IRI): https://www.orphadata.com/data/ontologies/ordo/last_version/ORDO_en_4.4.owl

Nico wrote:

(1) yes, exactly!

So I figured there was something wrong with http://www.orphadata.org/data/ORDO/ordo_orphanet.owl.

But I think that's not the case. I think @matentzn you just misread the URL that Trish was linking there? If that's the case, then yes, I agree, for #510 I should change it to depend on component-download-ordo.owl instead.

Do I understand correctly now?

matentzn commented 6 months ago

Yes, sounds right! Sorry for causing the confusion!

joeflack4 commented 6 months ago

Ok, @twhetzel this is super easy-peasy then. I need to:

joeflack4 commented 6 months ago

@matentzn Sorry, I need additional clarity.

Question 1: Use component > component download?

Answer: No. Because the hierarchy changes.

Original question

Can you think of any reason not to depend on `$(COMPONENTSDIR)/ordo.owl` instead of `component-download-ordo.owl`? I'd prefer to depend on `$(COMPONENTSDIR)/ordo.owl`, for various reasons. One of them is that I plan to delete `@en` labels in this component (https://github.com/monarch-initiative/mondo-ingest/issues/512#issuecomment-2094310273), and if I depend on the component, then I can remove the Python code I have in #510 which does the same thing. I looked at `$(COMPONENTSDIR)/ordo.owl`, and it looks like it does contain the correct subset classes and subclass relationships. So I think I would get the same robot template results if I use that as the source (I can double check).

Question 2: Delete ordo-construct-subsets.ru?

Another thing about $(COMPONENTSDIR)/ordo.owl. I noticed it has --update ../sparql/ordo-construct-subsets.ru \.

ordo-construct-subsets.ru

```sparql INSERT { ?cls oio:inSubset ?subset } WHERE { ?category rdfs:subClassOf phenome: . ?cls rdfs:subClassOf+ ?category . ?category rdfs:label ?categoryLabel . BIND( URI(CONCAT("http://purl.obolibrary.org/obo/mondo#ordo_", REPLACE(?categoryLabel, " ", "_"))) AS ?subset) } ```

But this now seems out of date. With #510, it seems like we don't care about what this query is doing; we don't care what the new/changed ORDO subset labels are. We're mapping these subset classes to the MONDO subset properties we've been using in the past:

SUBSET_MAP = {
    'Orphanet:557493': 'http://purl.obolibrary.org/obo/mondo#ordo_disease',  # disorder
    'Orphanet:557492': 'http://purl.obolibrary.org/obo/mondo#ordo_group_of_disorders',  # group of disorders
    'Orphanet:557494': 'http://purl.obolibrary.org/obo/mondo#ordo_clinical_subtype' }  # subtype of a disorder

I can think of additional ways we might want to tweak here. But it seems like we can get rid of ordo-construct-subsets.ru?

matentzn commented 6 months ago

Both very important questions, lets pick these up when we resume the work on this

joeflack4 commented 6 months ago

Update:

So this issue can be closed when #510 is merged.

joeflack4 commented 6 months ago

Addressed by: