monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Wrong IRI for Fanconi Renotubular Syndrome? #355

Closed jmcmurry closed 4 years ago

jmcmurry commented 8 years ago

I thought this was just an instance of an unresolvable IRI, but ontobee doesn't even seem to have an ontology with the prefix DC; it isn't familiar to me. Nor does a search for Renotubular in Ontobee yield results.

https://beta.monarchinitiative.org/disease/DC:0000148 points to http://purl.obolibrary.org/obo/DC_0000148

TomConlin commented 8 years ago

In a sane world DC would be "Dublin Core" followed by a metadata term.
Will try to make it different.

It is tagged with a "# TODO" in the curie_map.yaml

Although we may be allowed to have prefixes which differ only by upper/lower case I do not recommend it.

It is only mentioned as a prefix in OMIM in association with dc:evidence

Here is a very good reason prefixes should not differ only by case: in the base class for all ingests the lower case "dc" prefix is bound to the uppercase "DC" which is also already defined as a namespace in the python RDFLib we use everywhere.

Perhaps the TODO in the curie_map.yaml file was in reference to adding the darwin core uri to purl ?? which does not seem as though it could hurt (but I would want others more familiar to weigh in) but that would not address this issue because the identifier (0000148) makes no sense in the Darwin Core name space.

TomConlin commented 8 years ago

in our ingested triples the identifier fragment "0000148" exists in:

    grep -l "_0000148>" out/*.nt
    out/animalqtldb.nt
    out/coriell.nt
    out/hpoa.nt
    out/omia.nt
    out/wormbase.nt
    out/zfin.nt
    out/zfin_slim.nt

with ZP, HP, GENO, CL & LPT but never (so far) associated with the DC prefix

cmungall commented 8 years ago

DC is the prefix given to the OMIM disease clusters, by MGI. We can rename this in the Monarch repo.