Closed jakhag closed 7 years ago
The first query (/tree) fails in all combos of v2.1 and v2.2 IMS and Virtuoso.
The second query (/tree?root=enzyme) works with v2.1 Virtuoso fails with v2.2 Virtuoso.
The SPARQL query for second LDA query returns no results. The error message is:
This document is empty and basically useless. It is generated by a web service that can make some statements in HTML Microdata format. This time the service made zero such statements, sorry.
Here's the SPARQL:
PREFIX ops: <http://www.openphacts.org/api#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
CONSTRUCT { ops:conceptHierarchy dcterms:hasPart ?g_short .
?g_short ops:rootNode ?root_node .
?root_node skos:prefLabel ?name .
<http://purl.uniprot.org/enzyme> skos:prefLabel 'Enzyme Classification' .
<http://www.ebi.ac.uk/chembl/target> skos:prefLabel 'ChEMBL Target Hierarchy' .
<http://www.ebi.ac.uk/chebi> skos:prefLabel 'ChEBI Ontology' .
<http://www.geneontology.org> skos:prefLabel 'GeneOntology' .
<http://www.bioassayontology.org> skos:prefLabel 'BioAssayOntology' .
<http://purl.obolibrary.org/obo/doid> skos:prefLabel 'Human Disease Ontology' .
} WHERE { VALUES ?g_short { <http://purl.uniprot.org/enzyme> } {
SELECT DISTINCT ?root_node ?g_short WHERE { VALUES ?g_short { <http://purl.uniprot.org/enzyme> }
VALUES ?g {
<http://purl.uniprot.org/enzyme/direct>
<http://www.ebi.ac.uk/chembl/target/direct>
<http://www.ebi.ac.uk/chebi/direct>
<http://www.geneontology.org>
<http://www.bioassayontology.org>
<http://purl.obolibrary.org/obo/doid>
}
GRAPH ?g {
[] rdfs:subClassOf ?root_node .
MINUS {?root_node rdfs:subClassOf []}
FILTER ( isURI(?root_node) )
BIND (IF(?g = <http://purl.uniprot.org/enzyme/direct>, IRI(<http://purl.uniprot.org/enzyme>) ,
IF(?g = <http://www.ebi.ac.uk/chembl/target/direct>, IRI(<http://www.ebi.ac.uk/chembl/target>) ,
IF(?g = <http://www.ebi.ac.uk/chebi/direct>, IRI(<http://www.ebi.ac.uk/chebi>) ,
IF(?g = <http://www.geneontology.org>, IRI(<http://www.geneontology.org>) ,
IF(?g = <http://www.bioassayontology.org>, IRI(<http://www.bioassayontology.org>) ,
IF(?g = <http://purl.obolibrary.org/obo/doid>, IRI(<http://purl.obolibrary.org/obo/doid>), 'Error')))))) AS ?g_short )
}
}
}
{
?root_node rdfs:label ?name
}
UNION {
?root_node skos:prefLabel ?name
}
MINUS { ?root_node uniprot:obsolete true }
}
On alpha /tree seems to have all the hierarchies except enzyme (uniprot):
I guess that is why the root=enzyme query fails
The query seems to expect these triple patterns where ?root_node is <http://purl.uniprot.org/core/Enzyme>
:
<http://purl.uniprot.org/core/Enzyme> rdfs:label ?name .
<http://purl.uniprot.org/core/Enzyme> skos:prefLabel ?name .
but they don't exist. Maybe they need to be manually created and added to the triple store.
I understood that it was decided to only load the part of UniProt that is really being used (even though we always said having all of it is an advantage since you can easily add new calls for say e.g. PPIs when included). Maybe this part was simply not loaded?
Not sure @Chris-Evelo . If anyone has access to the beta sparql endpoint they could try running the following to see what the labels should be:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
select distinct * where {
<http://purl.uniprot.org/core/Enzyme> rdfs:label ?label .
<http://purl.uniprot.org/core/Enzyme> skos:prefLabel ?pref_label .
}
I'm not seeing any triples on beta (v2.1) or alpha (v2.2) matching pattern:
<http://purl.uniprot.org/core/Enzyme> ?p ?o .
.
beta = http://beta.openphacts.org:3003/sparql alpha = http://alpha.openphacts.org:8890/sparql
The strange thing is, if you run the child query (http://alpha.openphacts.org:3002/tree/children?uri=http%3A%2F%2Fpurl.uniprot.org%2Fenzyme%2F1.-.-.-&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1&_format=json), the labels are there. Also with the parents query, all top level class labels are shown (e.g. http://alpha.openphacts.org:3002/tree/parents?uri=http%3A%2F%2Fpurl.uniprot.org%2Fenzyme%2F6.2.-.-&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1&_format=json).
Was there any change in the SPARQL query from 2.1 to 2.2?
@danidi -- Far as I can tell no change to the "/tree" SPARQL since before 2015.
So there is something wrong with the enzyme hierarchy on alpha. Have a look at this query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct * WHERE {
<http://purl.uniprot.org/enzyme/1.-.-.-> ?p ?o .
}
LIMIT 100
If you look closely you will notice that on alpha
<http://purl.uniprot.org/enzyme/1.-.-.-> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.uniprot.org/enzyme/1.-.-.->
and
<http://purl.uniprot.org/enzyme/1.-.-.-> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.uniprot.org/core/Enzyme>
but on beta it is not. I'm sure that it shouldn't be a subclass of itself, whether it should be a subclass of Enzyme I'm not sure but on beta it implies not. As you know you can go RDF blind looking at this stuff so please check for yourself.
If you look at the original query a few comments above it doesn't allow the root nodes to be a subClassOf anything MINUS {?root_node rdfs:subClassOf []}
. So it looks like the enzyme root nodes have some rogue statements.
Technically, in the land of RDF and OWL, (if I'm remembering right) all Classes are rdfs:subClassOf themselves. Though the name is "subClassOf", it really means "subClass of or equivalent to". But I imagine no one wanted such an awkward name.
URIs like <http://purl.uniprot.org/enzyme/1.14.-.->
I think are supposed to represent "classes" of Enzymes rather than instances of Enzymes, right?
So maybe saying it's rdf:type=Enzyme is incorrect to a semantic purist. Though I don't think any semantic purists would survive very long breathing the air of the OpenPhacts triple-store.
To a practical RDF hacker in OpenPhacts data it likely just comes down to whatever RDF statements can produce the right answers, that's the "correct" semantics.
I just removed the ?s rdfs:subClassOf http://purl.uniprot.org/core/Enzyme to see if it makes a difference.
Does this look like the right answer:
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ns1: <http://www.ebi.ac.uk/> .
@prefix ns2: <http://purl.obolibrary.org/obo/> .
@prefix ns3: <http://purl.uniprot.org/> .
@prefix ns4: <http://www.openphacts.org/api#> .
@prefix ns5: <http://www.ebi.ac.uk/chembl/> .
@prefix ns6: <http://purl.org/dc/terms/> .
ns1:chebi
skos:prefLabel "ChEBI Ontology" .
ns2:doid
skos:prefLabel "Human Disease Ontology" .
<http://www.geneontology.org>
skos:prefLabel "GeneOntology" .
ns3:enzyme
skos:prefLabel "Enzyme Classification" ;
ns4:rootNode <http://purl.uniprot.org/enzyme/1.-.-.-> , <http://purl.uniprot.org/enzyme/2.-.-.-> , <http://purl.uniprot.org/enzyme/3.-.-.-> , <http://purl.uniprot.org/enzyme/4.-.-.-> , <http://purl.uniprot.org/enzyme/5.-.-.-> , <http://purl.uniprot.org/enzyme/6.-.-.-> .
<http://purl.uniprot.org/enzyme/1.-.-.->
skos:prefLabel "Oxidoreductases" .
<http://purl.uniprot.org/enzyme/2.-.-.->
skos:prefLabel "Transferases" .
<http://purl.uniprot.org/enzyme/3.-.-.->
skos:prefLabel "Hydrolases" .
<http://purl.uniprot.org/enzyme/4.-.-.->
skos:prefLabel "Lyases" .
<http://purl.uniprot.org/enzyme/5.-.-.->
skos:prefLabel "Isomerases" .
<http://purl.uniprot.org/enzyme/6.-.-.->
skos:prefLabel "Ligases" .
<http://www.bioassayontology.org>
skos:prefLabel "BioAssayOntology" .
ns5:target
skos:prefLabel "ChEMBL Target Hierarchy" .
ns4:conceptHierarchy
ns6:hasPart ns3:enzyme .
In SPARQL query, replaced this:
MINUS {?root_node rdfs:subClassOf []}
With this:
MINUS {
?root_node rdfs:subClassOf ?super .
FILTER( ?super != ?root_node && ?super != <http://purl.uniprot.org/core/Enzyme> )
}
Yeah, that's probably one way to fix it but it never needed that before so why now? Anyway, I ran this:
SPARQL DELETE WHERE { GRAPH <http://purl.uniprot.org/enzyme/direct> {?s <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.uniprot.org/core/Enzyme> . }};
Which fixed it. We can always add the subClassOf
PS. Nice sparql @randykerber :)
Contents of graph <http://purl.uniprot.org/enzyme/inference>
are added by this SPARQL query:
INSERT {
GRAPH <http://purl.uniprot.org/enzyme/inference> {
?subclass <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?superclass .
?subclass <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?subclass .
}
}
WHERE {
GRAPH <http://purl.uniprot.org/enzyme/direct> {
?subclass <http://www.w3.org/2000/01/rdf-schema#subClassOf>+ ?superclass ;
[] []
}
}
I comes from a file called "insert_queries.sparql" that came from inside the tar file called "enzyme.tar" from the file repository for openphacts v2.1 (here): https://data.openphacts.org/free/2.1/rdf/
This issue appears to be fixed. Test queries do now return answers.
However, should look at the actual answers returned and see if they really make sense.
For example, for root=chembl
the root is returned as:
<http://rdf.ebi.ac.uk/resource/chembl/protclass/CHEMBL_PC_0> skos:prefLabel "Protein class" .
Root of chembl hierarchy is "Protein class"? That doesn't sound right.
This query appears to show all 6 of the roots: http://alpha.openphacts.org:3002/tree
I think "Protein class" is ok, as all the children are actually proteins (it's the ChEMBL Protein Target Tree). At least it's the same behaviour as previously.
@danidi -- ok, if it's doing what it was designed to do, and did before, I'll call that "working" and close this. Though it is a misleading label. Might some day consider renaming to something like "/tree?root=chembl_protein", or 2 params, e.g., "/tree?dataset=chembl&category=protein"
Enzyme queries seem to fail. Interestingly, the child/parent hierarchy calls work, but the root call is not showing enzyme and chebi.
http://alpha.openphacts.org:3002/tree?app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1
In 2.2 (alpha): 404 Not Found http://alpha.openphacts.org:3002/tree?app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1&root=enzyme
In 2.1 (beta): https://beta.openphacts.org/2.1/tree?app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1&root=enzyme