wikipathways / GPML2RDF

GPML2RDF converter
Apache License 2.0
4 stars 2 forks source link

add Bioregistry-style compact identifiers #117

Open egonw opened 1 year ago

egonw commented 1 year ago

@cthoyt, let's say, I would be adding Bioregistry-style compact identifiers as Literals to the WikiPathways RDF, is there already an ontology with a predicate for such identifiers?

Let's say we have this:

<https://identifiers.org/ensembl/ENSG00000139163>
        rdf:type            wp:GeneProduct , wp:DataNode ;
        rdfs:label          "ETNK1" ;
        wp:bdbEnsembl       <https://identifiers.org/ensembl/ENSG00000139163> ;
        wp:bdbEntrezGene    <https://identifiers.org/ncbigene/55500> ;
        wp:bdbHgncSymbol    <https://identifiers.org/hgnc.symbol/ETNK1> ;
        wp:bdbUniprot       <https://identifiers.org/uniprot/A0A5K1VW28> , <https://identifiers.org/uniprot/Q86U68> , <https://identifiers.org/uniprot/Q9HBU6> , <https://identifiers.org/uniprot/H0YH69> , <https://identifiers.org/uniprot/H0YFP7> , <https://identifiers.org/uniprot/A0A
5F9ZI33> ;
        wp:bdbWikidata      <http://www.wikidata.org/entity/Q18041828> .

And I would add something like x:y as below, what should x:y be (ps, ignore the prefixes, I didn't check them; you get the point)?

<https://identifiers.org/ensembl/ENSG00000139163>
        x:y       "ensembl:ENSG00000139163" , ncbigene:55500", "hgnc.symbol:ETNK1", uniprot:A0A5K1VW28", "uniprot:Q9HBU6", "uniprot:H0YH69", "uniprot:H0YFP7", "uniprot/A0A5F9ZI33, "wikidata:Q18041828" ;

I could use wp:bdbBioregistry but maybe you have something better in mind?

(yes, once we're done with full RDF support in Bioregistry we can add that too)

cc @DeniseSl22 @marvinm2 @ammar257ammar

DeniseSl22 commented 1 year ago

I believe BioRegistry has something like that in their own database (which I'm currently using for the Kinetic RDF model):

@prefix uniprot:   <https://identifiers.org/uniprot/> . 
@prefix bioregistry: <https://bioregistry.io/oboinowl:> . 

uniprot:P21549 bioregistry:hasDbXref uniprotkb:P21549.
cthoyt commented 1 year ago

There's a lot going on here, let me try to unpack it.

wrt Egon's original comment, I think what you're trying to do is say that for URI entity https://identifiers.org/ensembl/ENSG00000139163, there are some equivalent things who have CURIE representations as "ensembl:ENSG00000139163", "ncbigene:55500", etc.

This is a bit confusing since it combines two logical operations together, maybe you could instead do something like

https://identifiers.org/ensembl/ENSG00000139163 skos:exactMatch https://bioregistry.io/ncbigene:55500
https://bioregistry.io/ncbigene:55500 <predicate> "ncbigene:55500"

The second thing that's confusing is that the semantics implied by such a predicate would be redundant in connecting https://identifiers.org/ensembl/ENSG00000139163 and "ensembl:ENSG00000139163"

The Bioregistry schema has a lot of predicates for talking about meta stuff. It also links to several other partially overlapping vocabularies like vann, void, sh, and idot (see the turtle). However, it doesn't have predicates for connecting an IRI to a CURIE representation of the IRI like <predicate> did in the example above.


@DeniseSl22 I'm don't think you're using the prefixes correctly in your example

@prefix uniprot:   <https://identifiers.org/uniprot/> . 
@prefix bioregistry: <https://bioregistry.io/oboinowl:> . 

uniprot:P21549 bioregistry:hasDbXref uniprotkb:P21549.

The vocabulary on the second line is oboinowl, so you should call it that

@prefix uniprot:   <https://identifiers.org/uniprot/> . 
@prefix oboinowl: <https://bioregistry.io/oboinowl:> . 

uniprot:P21549 oboinowl:hasDbXref uniprotkb:P21549.

Second, I'm not what the context for uniprotkb is in this example since it's not defined.

Maybe of interest to you - Bioregistry has a SPARQL service that implements identifier mapping (e.g., so you can avoid materializing redundant definitions of the same entity using multiple URI prefixes). I think it will be easier to finish this discussion over a call sometime this or next week if you have time