monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

`components/omim.owl` doesn't retain prefixes from OMIM ingest #55

Open joeflack4 opened 2 years ago

joeflack4 commented 2 years ago

Overview

I was working in Python and looking at the entities that OAK queried for me. It returns a list of terms as CURIEs if the prefixes are available in the file, and otherwise shows the whole URI. When I split on : and got a unique set, this is all that came back: {'RO', 'CHR', 'https', 'SO'}

Details

components/omim.owl prefixes I noticed that OMIM and OMIMPS don't appear:

        <rdf:RDF xmlns="http://purl.obolibrary.org/obo/mondo/sources/omim.owl#"
         xml:base="http://purl.obolibrary.org/obo/mondo/sources/omim.owl"
         xmlns:obo="http://purl.obolibrary.org/obo/"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xml="http://www.w3.org/XML/1998/namespace"
         xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:skos="http://www.w3.org/2004/02/skos/core#"
         xmlns:vocab="https://w3id.org/biolink/vocab/"
         xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">

omim.ttl from OMIM ingest They do appear here:

        @prefix CHR: <http://purl.obolibrary.org/obo/CHR_> .
        @prefix CL: <http://purl.obolibrary.org/obo/CL_> .
        @prefix HGNC: <https://identifiers.org/hgnc:> .
        @prefix HGNC_symbol: <https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/> .
        @prefix IAO: <http://purl.obolibrary.org/obo/IAO_> .
        @prefix NCBIGene: <https://www.ncbi.nlm.nih.gov/gene/> .
        @prefix NCBITaxon: <http://purl.obolibrary.org/obo/NCBITaxon_> .
        @prefix OMIM: <https://omim.org/entry/> .
        @prefix OMIMPS: <https://www.omim.org/phenotypicSeries/PS> .
        @prefix ORPHA: <http://www.orpha.net/ORDO/Orphanet_> .
        @prefix PMID: <http://www.ncbi.nlm.nih.gov/pubmed/> .
        @prefix RO: <http://purl.obolibrary.org/obo/RO_> .
        @prefix SO: <http://purl.obolibrary.org/obo/SO_> .
        @prefix UMLS: <http://linkedlifedata.com/resource/umls/id/> .
        @prefix biolink: <https://w3id.org/biolink/vocab/> .
        @prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#> .
        @prefix owl: <http://www.w3.org/2002/07/owl#> .
        @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
        @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
        @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

Possible causes

I think that we haven't set up the OMIM ingest to update the PURL from the latest release.

How do we do that? I know now that our PURL for Mondo gets updated when a latest release gets published there. Is this a manual or automated process?

Related

Also outdated: #58

matentzn commented 2 years ago

In this case, shouldnt your code be robust enough to understand that a resulting value is a curie or not (there are methods in the various curie packages for that), and if it is not, use the mondo prefix map to handle? We can never be sure that all files we use have the correct RDF prefixes in them..

joeflack4 commented 2 years ago

Yes, it is easy to handle, and will be handled in this case.

I haven't checked all of the other .owl files in his way; maybe it's not just OMIM.

I suppose that I would have been using prefix_map anyway for this filtering, but I haven't gotten that far in my code yet. Mostly spent time yesterday on OAK issues.

So I wouldn't say that this issue is a hold up for any of my work. I am just imagining that something is probably wrong here. The generation of the .owl files seems lossy in this way in that it is not retaining prefixes. Probably would be useful if we can fix that.

matentzn commented 2 years ago

Its a big disconnect between the RDF world and the rest of the world - in RDF prefixes mean nothing, they are pure syntactic sugar:

...
@prefix CHR: <http://purl.obolibrary.org/obo/CHR_> .

CHR:123 rdfs:label "label"

is identical (equal) to:

@prefix ZORRO: <http://purl.obolibrary.org/obo/CHR_> .

ZORRO:123 rdfs:label "label"

and the same with:

<http://purl.obolibrary.org/obo/CHR_123> rdfs:label "label"

(I repeat, no difference between all three all the same).

Only in database - bioinformatics-land is their any difference between these three, because they perceive a CURIE as an identifier.

joeflack4 commented 2 years ago

Ah, yes. This jives with my understanding.

Still probably want to retain this information though, I would imagine.