Open jakhag opened 7 years ago
Looking at the sparql for the pages query it seems that the members have no dcterms:title. However, they do have an rdfs:label. Maybe we should use that instead. But is it correct
? When I changed it to rdfs:label I get the following (abridged to save space). You will notice that not all of the items
have info attached. Is this to be expected? Or is there something else going on? It is possible that some data that is expected is also missing.
<items>
<item href="http://purl.uniprot.org/uniprot/E0TXE1"/>
<item href="http://purl.uniprot.org/uniprot/E1UV19"/>
<item href="http://purl.uniprot.org/uniprot/E3E2E2"/>
<item href="http://purl.uniprot.org/uniprot/E3UUE6"/>
<item href="http://purl.uniprot.org/uniprot/E7FHP1"/
<item href="http://purl.uniprot.org/uniprot/O14975">
<prefLabel>Very long-chain acyl-CoA synthetase</prefLabel>
<exactMatch href="http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL4326">
<prefLabel>Fatty acid transport protein 2</prefLabel>
<type href="http://rdf.ebi.ac.uk/terms/chembl#SingleProtein"/>
<inDataset href="http://www.ebi.ac.uk/chembl"/>
<target_organism>Homo sapiens</target_organism>
</exactMatch>
<inDataset href="http://purl.uniprot.org"/>
<target_organism_uri href="http://purl.uniprot.org/taxonomy/9606"/>
</item
<item href="http://purl.uniprot.org/uniprot/O22898"/>
</items>
Here is the sparql query below. I changed it to look for ?item dcterms:title|rdfs:label ?chembl_name
ie either dcterms:title
or rdfs:label
. BTW this API call is one of those 2 part ones where it first finds all the items and then gets the properties in a different call. Not really sure is it needs those OPTIONAL
blocks or not.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://www.semantic-systems-biology.org/ontology/rdf/OBO#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX goa: <http://www.semantic-systems-biology.org/ontology/rdf/GOA#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://www.semantic-systems-biology.org/ontology/rdf/OBO#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX goa: <http://www.semantic-systems-biology.org/ontology/rdf/GOA#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?item WHERE {VALUES ?g { <http://purl.uniprot.org/enzyme/inference> <http://www.ebi.ac.uk/chembl/target/inference> <http://www.geneontology.org/inference> }
VALUES ?node_uri { <http://purl.uniprot.org/enzyme/6.2.-.-> } GRAPH ?g {
?child_node rdfs:subClassOf ?node_uri.
FILTER ( isURI(?child_node) )
}
{ ?item obo:C ?child_node .
?item uniprot:reviewed true }
UNION { ?item obo:F ?child_node .
?item uniprot:reviewed true }
UNION { ?item obo:P ?child_node .
?item uniprot:reviewed true }
UNION { ?item uniprot:enzyme|uniprot:domain/uniprot:enzyme|chembl:hasProteinClassification ?child_node }
VALUES ?g2 {<http://purl.uniprot.org> <http://www.ebi.ac.uk/chembl> <http://www.openphacts.org/goa> }
GRAPH ?g2 {
?item [] []
}
{
?item dcterms:title|rdfs:label ?chembl_name
FILTER (?chembl_name != '')
}
UNION { ?item goa:description ?uniprot_name
FILTER (?uniprot_name != '') }
OPTIONAL {
{?mapping skos:relatedMatch/skos:exactMatch ?item }
UNION { ?item skos:relatedMatch/skos:exactMatch ?mapping }
MINUS { ?mapping a chembl:ProteinComplexGroup }
{ ?mapping goa:description ?mapping_name }
UNION { ?mapping dcterms:title ?mapping_name }
FILTER ( ?mapping_name != '' )
{ ?mapping uniprot:organism ?mapping_org_uri }
UNION { ?mapping chembl:organismName ?mapping_org
GRAPH <http://www.ebi.ac.uk/chembl> {
?mapping a ?mapping_type
FILTER ( ?mapping_type != chembl:UniprotRef )
}
}
BIND(IF(BOUND(?mapping_org), <http://www.ebi.ac.uk/chembl>, <http://purl.uniprot.org>) AS ?mapping_dataset)
}
OPTIONAL { ?item uniprot:organism ?uniprot_organism
BIND (?item AS ?uniprot_target) }
OPTIONAL {
GRAPH <http://www.ebi.ac.uk/chembl> {
?item a ?target_type
}
}
OPTIONAL { ?item chembl:organismName ?chembl_organism
BIND (?item AS ?chembl_target) }
} ORDER BY ?item LIMIT 500 OFFSET 500
I would expect a prefLabel from Uniprot for each of the items, but not necessarily from ChEMBL.
Takes about 0.6 seconds compared to 1.8 seconds if the optional are removed from the query.
The organism and organism name are filters for the query. Does it make the query slower, even if no filter parameter is set by the user?
Ok, thanks @danidi. So the optionals are for the filters. It does make it slower if no filters are set but no real way to avoid that without a bit of a code re-write. It's not massively slower though.
For http://purl.uniprot.org/enzyme/6.2.-.- the count is 1371:
http://alpha.openphacts.org:3002/target/members/count?uri=http%3A%2F%2Fpurl.uniprot.org%2Fenzyme%2F6.2.-.-&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1
But the target class members list does not retrieve as much data. It fails at page 2:
http://alpha.openphacts.org:3002/target/members/pages?uri=http%3A%2F%2Fpurl.uniprot.org%2Fenzyme%2F6.2.-.-&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1&_page=1&_pageSize=500
http://alpha.openphacts.org:3002/target/members/pages?uri=http%3A%2F%2Fpurl.uniprot.org%2Fenzyme%2F6.2.-.-&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1&_page=2&_pageSize=500 404: page not found