monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

OMIMPS to match usage in Mondo #928

Closed kshefchek closed 4 years ago

kshefchek commented 4 years ago

Currently Mondo stores dbxref curies as strings, so the IRI normalization doesn't work. As a result, our different usage of OMIMPS (we have double PS) breaks our UI, see https://github.com/monarch-initiative/monarch-ui/issues/323

This might have been an arbitrary decision I made a couple years back without putting a lot of thought into it, @TomConlin perhaps you recall? I think it's easier for us to update our curie map to match Mondo's usage.

TomConlin commented 4 years ago

If there is a concrete proposal to make something dipper more correct please clarify. PSnnn is the "Phenotypic Series number" as it is expected to be found as written by its authors. see: https://omim.org/help/linking#1_4

To see these in use: https://omim.org/phenotypicSeriesTitles/all (noting they are writing identifiers with prefixes not curies)

...Although we are using the out of date base-uri
http://www.omim.org/phenotypicSeries/

We should be dropping the www and adding the https to match current conventions.

I am not aware of OMIM Phenotypic Series having any preferred/predefined or commonly used curie-prefix, so if there is a better one than OMIMPS: it would be great to know about .

Note: splitting a URI within the set of characters expected to form local identifiers instead of at one of the common chars i.e:[/:#_?.~] etc. used to parse a url is sub optimal.

If there is need to post process dippers public RDF output into something to better accommodate Mondo's (or other) internal processes we should discuss that.

(I believe there is and that regardless of how this particular issue goes. there have been and will be more.)


Taking a look at at a fragment from mondo.json

      "id" : "http://purl.obolibrary.org/obo/MONDO_0014960",
      "meta" : {
        "subsets" : [ "http://purl.obolibrary.org/obo/mondo#ordo_disease" ],
        "xrefs" : [ {
          "val" : "UMLS:C4310675"
        }, {
          "val" : "OMIMPS:617186"
        } ],

To be made correct that last val: would need to be changed to OMIMPS:PS617186. Is there anyone available to discuss having this addressed at the source?

To hack it at the command line takes all of 0.485 seconds to correct this mondo(.json), but Mondo not producing something that needs correcting in the first place would be the better solution.

kshefchek commented 4 years ago

I think this is a fair position, OMIM's local identifier is PS1234 so we should keep it as such in the local identifier portion of the curie. Closing this as I think we can also fix with the UI code.