monarch-initiative / omim

Data ingest pipeline for OMIM.
6 stars 2 forks source link

Remove `www` from `https://www.omim.org/phenotypicSeries/PS` #103

Closed joeflack4 closed 3 months ago

joeflack4 commented 11 months ago

Overview

This would likely involve some changes in mondo-ingest and mondo as well I would imagine.

Sub-tasks

matentzn commented 11 months ago

How did this arise? There should be one way and one way only to describe phenotypic series IRIs, and that's defined in Mondo ingest prefix map!

matentzn commented 11 months ago

This need to correspond to whatever you have in omim.owl

joeflack4 commented 11 months ago

Nico wrote:

How did this arise? There should be one way and one way only to describe phenotypic series IRIs, and that's defined in Mondo ingest prefix map!

Not sure. The OMIM ingest predates when I started working for BIDS at JHU. It's probably been this way long before I found out about the mondo-ingest prefix_map.

matentzn commented 11 months ago

I mean, just look in omim.owl how OMIMPS Uris are written!

joeflack4 commented 11 months ago

Sorry, when I said "not sure", I was not responding to "This need to correspond to whatever you have in omim.owl". I edited my comment to clarify.

It sounds like what you are saying is that, going forward for the omim repository, the OMIPS URI stem should match what we have currently in mondo-ingest components/omim.owl. If I'm wrong please correct.

I did look at components/omim.owl, and I see it's using:

        <rdfs:subClassOf rdf:resource="https://www.omim.org/phenotypicSeries/PS100070"/>

In the omim repo's omim.ttl, its:

@prefix OMIMPS: <https://www.omim.org/phenotypicSeries/PS> .

So they currently match.

But the www is still in there, which sounds like it's a problem.


I'm not sure in in mondo-ingest what the most authoritative are for prefix maps is I'm assuming it's in each metadata/ONTOLOGY.yml. And I'm assuming we eventually want config/prefixes.csv to be dynamically generated from those if it's not already. For omim.yml we have: OMIMPS: https://omim.org/phenotypicSeries/PS.


In any case, it's unclear to me currently whether i should proceed forward with fixing the OMIMPS URI stem to remove the www in the omim repository, or any other action items needed. And should I consider this a low/medium priority? @matentzn Please advise.

matentzn commented 11 months ago

A URI mismatch is generally a very high-priority issue. But, right now, everything works fine (the pipeline works):

https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/slurp/omim.tsv#L64

Right now, I care only about the fact that the whole system runs; so lets keep this issue open, but as low-priority. If someone external puts pressure on us to change the OMIMPS URIs we will do immediately.

joeflack4 commented 3 months ago

Subtask (1), the OMIM side, was handled by https://github.com/monarch-initiative/omim/pull/107#pullrequestreview-1953659980. I remember this coming up. I thought I added a review comment about it connecting to this. Either my memory serves incorrectly, or we failed to follow up. In any case, subtask (2), the mondo-ingest side, was handled today by https://github.com/monarch-initiative/mondo-ingest/pull/478. Gonna close this after that PR is merged.

matentzn commented 3 months ago

As I am really deeply afraid of this change, I would like to suggest:

  1. To make a OMIM release locally and stick it on dropbox
  2. Create a mondo ingest branch and update mondo-ingest-odk.yaml with the link to that omim release
  3. Run make update_import
  4. Make draft PR and assign to Trish for review
  5. Run whole release pipeline on that branch
  6. Make another draft PR with the data updates and assign to Trish for review, with a special emphasis on changes to OMIM related data files (I should also review that).
joeflack4 commented 3 months ago

You already implemented this change on the omim side in #107, and it did have downstream effects on the mondo-ingest side that were fixed in https://github.com/monarch-initiative/mondo-ingest/pull/478.

Of course, there might be more effects that we haven't noticed yet. However as everything is merged already, we might as well review the state of things as they are now in production.


Edit: I now see that Nico sees that this is already handled. Disregard my above comment and Nico's last comment.