Open matentzn opened 1 year ago
So, what is the correct URI for is_a
? I couldn't find anything in OLS.
rdfs:subClassOf
So, would rdfs:subClassOf
be considered the CURIE for https://www.w3.org/TR/rdf-schema/#ch_subclassof
once expanded and is_a
being a synonym? Just wondering how this fits with the Node
model in obographs.
I think rdfs:subClassOf is considered a "built-in" and would probably not be represented in obographs at all as a node. Its a good questions though, the distinction is sort of arbitrary. Why do you need to know a CURIE/IRI for isa
in obographs? OAK has an obographs to OWL mappings which handles all this expansion..
This is indeed how it is (once http://purl.obolibrary.org/obo/emapa#is_a is removed) - there is zero mention about the source of 'is_a', yet it is the most commonly referenced predicate. Practically all other predicates used in an Edge
are URIs declared with their label as a Node
(either as a CLASS
or PROPERTY
type), so if you're doing the naive thing of using URIs to look-up a Node
it fails here because is_a
is never declared anywhere!
e.g.
GraphDocument graphDocument = openGraphDocument("phenio.json");
Graph phenio = graphDocument.getGraphs().get(0);
// create a map of Id: Node, where Id is a URI String
Map<String, Node> nodes = phenio.getNodes().stream()
.map(node -> Node.of(node.getId(), node.getLabel()))
.collect(Collectors.toMap(Node::getId, Function.identity()));
// special case it seems
Node isA = nodes.values().stream()
.filter(node -> node.getLabel().equals("is_a"))
.findFirst()
// so this would be where to put rdfs:subClassOf - perhaps that ought to be the URI and keep `is_a` as the label?
.orElse(Node.of("http://purl.obolibrary.org/obo/emapa#is_a", "is_a"));
phenio.getEdges().stream()
.forEach(edge -> {
Node subject = hpNodes.get(edge.getSub());
Node object = hpNodes.get(edge.getObj());
// annoyingly we can't treat the predicate in a consistent fashion due to
// 'is_a' being an undeclared, implicit 'primitive' type
Node predicate;
if (edge.getPred().equals("is_a")) {
predicate = isA;
} else {
predicate = hpNodes.get(edge.getPred());
}
})
cc @cmungall
The focus of this issue is predicates like http://purl.obolibrary.org/obo/emapa#is_a
which seems like a data bug. I don't think this has anything to do with obojson.
I checked the latest emapa.owl and it is present but only as a declaration
✗ grep is_a db/phenio.owl | grep emapa
<!-- http://purl.obolibrary.org/obo/emapa#is_a -->
<owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/emapa#is_a">
unfortunately there is no way to tell from the OWL where this comes from but a good guess is emapa itself:
✗ curl -L -s $OBO/emapa.owl | grep emapa#is_a
<!-- http://purl.obolibrary.org/obo/emapa#is_a -->
<owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/emapa#is_a">
I know the genesis of these things, twenty years ago someone declared is-a in oboedit even though they didn't need to and it is sticking around ever since.
This one is harmless as it's just a declaration that is not used. Of course we should still report upstream and possibly fix, and we should do more QA/QC on ontologies we bring in.
But there are worse issues. emapa isn't using the standard part-of predicate (BFO:0000050). It is using http://purl.obolibrary.org/obo/emapa#part_of
This means that partonomy queries on EMAPA will yield massively incomplete results. And EMAPA is essentially a partonomy, there is minimal info in subclassing.
I suggest a strategy:
We should have a monarch-wide simple profile that should be satisfied
Pretty much everything else can be ignored
I don't think obojson is relevant to this issue at all, but regarding the question, yes is_a
is the hardcoded value for rdfs:subClassOf between two named classes.
Phenio contains some relationships like http://purl.obolibrary.org/obo/emapa#is_a which is really confusing. These should be removed prior to release.