monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Benefits to 'self-typing' punned class IRIs in the a-box? #228

Open mbrush opened 8 years ago

mbrush commented 8 years ago

In the rdf datasets we are deriving from sources, we implement the practice of using class IRIs as individuals (i.e. punning) to simplify re-use of established community vocabularies in describing our entities of interest. The most common example of this is punning gene class IRIs in linking to them from variants/alleles of the gene:

  shha<tbx392>   is_allele_of   ZFIN:ZDB-GENE-980526-166  (d. rerio shha gene)

where NCBIgene:30269 is a t-box class IRI that is punned into an individual in the a-box. These punned IRIs, as individuals in the a-box, are not typed as instances of any class (other than the default owl:Thing) - and thus a DL reasoner has no knowledge that the punned gene IRI has any relationship to the class gene. This information is useful to us, and could be capture by simply automatically typing punned IRIs as "instances of themselves".

 ZFIN:ZDB-GENE-980526-166 (the a-box individual)   rdf:type   ZFIN:ZDB-GENE-980526-166  (the t-box class)

In doing this, we might conceptualize the gene individuals created by punning class IRIs as a short-hand way to reference in the a-box some canonical instances of the gene class. And any triples about it (i.e. with it as the subject) are interpreted to be about the canonical gene.

Having this type information for punned class IRIs could support OWL/DL reasoning use cases for our data. For example, if it needs to answer a DL query that includes a reference to some gene, we want the reasoner to know that the punned shha IRI is a type of 'gene'.

Consider for the graph below a DL query to find all genes on danio rerio chromosome 7 (gene and subsequence_of chromosome 7). If we don't type the punned gene class IRIs as being of rdf:type gene, then our DL query doesn’t return what we want form our data.

punning example

I am sure there are many other use cases where it may be useful to have type information about a punned class IRI in the a-box. Not sure what the cons might be of automatically generating a 'self-typing' triple for all punned class IRIs, other than the memory/processing requirements for extra triples/axioms.

Thoughts from @cmungall, @ShahimEssaid, others?

ShahimEssaid commented 8 years ago

People can blame me for this idea :-) It is the right thing to do (but maybe in a dedicated OWL file that can be imported when needed), and it is not a new idea. Any modeling language has to somehow close the modeling hierarchy (i.e. somehow define the top concept) at the top and it is usually done this way rather than with a larger circular definition. It is also a common way for applying/transitioning semantics when meta-modeling is used. This basically takes the semantics from one model layer to the next model layer. In this case the T/A boxes are the layers.

You can see an example of this on line 14 here: http://www.w3.org/2000/01/rdf-schema# but it is also seen in other modeling languages as well.