monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
5 stars 3 forks source link

`make` Refactor: `mondo.sssom.tsv` #270

Open joeflack4 opened 1 year ago

joeflack4 commented 1 year ago

Overview

@matentzn Not sure if this is something refactoring, but when I was working on the GARD ingest, I was looking for the purl for mondo.sssom.tsv (Slack thread). I ultimately found http://purl.obolibrary.org/obo/mondo/mappings/mondo.sssom.tsv. However, I'm not seeing that this is currently used in mondo-ingest.

Instead, tmp/mondo.sssom.tsv is created as a byproduct of the tmp/mondo.owl goal:

tmp/mondo.owl:
    if [ $(USE_MONDO_RELEASE) = true ]; then wget http://purl.obolibrary.org/obo/mondo.owl -O $@; else cd $(TMPDIR) &&\
                ...
        cp $(TMPDIR)/mondo/src/ontology/mappings/mondo.sssom.tsv $(TMPDIR)/mondo.sssom.tsv &&\

I checked with Nicole and also verified that the purl looks like it's kept up-to-date. Should we create a new goal called tmp/mondo.sssom.tsv and simply have it download using the purl?

matentzn commented 1 year ago

There are three mondo.sssom.tsv:

  1. The one offered by the PURL. That is only ever updated during the release, and contains the officially released Mondo mappings.
  2. The one offered created in mondo.Makefile by the mondo build pipeline. This is needed by mondo-ingest so we can keep the unmapped mappable concepts in sync with the main branch in mondo, and not with the release only.
  3. The one that is extracted from mondo using sssom parse. This also contains xrefs (non-official mondo mappings) and it can be used for analyses such as the one you did with GARD.
joeflack4 commented 1 year ago

RE (1): Ah, understood. Btw I could not find the purl published anywhere (Slack thread). I simply guessed the URL and got it right. the mapping_id in the mondo.sssom.tsv has a UUID in it. Makes sense to have a URL for the specific version. Are there plans in SSSOM (if doesn't already exist) to make something called like a mapping_stable_url/id or mapping_release_url/id; a URL that never changes?

RE (2): Ah, OK. Well let's leave things as it is, unless you think we should change the goal name tmp/mondo.owl to tmp/mondo.owl tmp/mondo.sssom.tsv?

RE (3): Ah, I am unfamiliar. Is this something that is routinely run and committed / released somewhere? Or is this just meant to be run ad hoc?

matentzn commented 1 year ago

Makes sense to have a URL for the specific version. Are there plans in SSSOM (if doesn't already exist) to make something called like a mapping_stable_url/id or mapping_release_url/id; a URL that never changes?

the mapping_set_id should be that, I made an issue for Harshad to make sure the mapping_set_id corresponds to the correct PURL: https://github.com/monarch-initiative/mondo/issues/5846. Thanks for noticing!

(2) very good point, I think for transparency, it should be changes as you suggest.

Ah, I am unfamiliar. Is this something that is routinely run and committed / released somewhere? Or is this just meant to be run ad hoc?

https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/mondo-ingest.Makefile#L166

joeflack4 commented 1 year ago

Ah, OK. mapping_set_id is supposed to be a stable URL.

Should I open up an issue about (a) making the documentation clearer,(b) adding a field for the IRI of a specific version of the mapping set, or (c) both?

https://mapping-commons.github.io/sssom/

mapping_set_id: A globally unique identifier for the mapping set (not each individual mapping). Should be IRI, ideally resolvable.


Regarding (2), I'll update my open "makefile standardization" PR to add that.


Roger that as well on the sssom parse goal; got it!

matentzn commented 1 year ago

Should I open up an issue about (a) making the documentation clearer,(b) adding a field for the IRI of a specific version of the mapping set, or (c) both?

(b) exists, bit (a) -> YES

joeflack4 commented 1 year ago

Done! I could not find 'b' so I added that to the issue as well: