ncbo / goo

Graph Oriented Objects (GOO) for Ruby. A RDF/SPARQL based ORM.
http://ncbo.github.io/goo/
Other
15 stars 6 forks source link

Paging call returns wrong results in AllegroGraph #103

Closed mdorf closed 4 years ago

mdorf commented 4 years ago

One of our most heavily used API calls that returns paged data yields incorrect results in AllegroGraph:

paging = LinkedData::Models::Class.in(sub).include(:prefLabel).page(page, size)
paging = LinkedData::Models::Class.in(sub).include(:prefLabel, :synonym).page(page, size)

The first call results in the following SPARQL query:

SELECT DISTINCT ?id ?prefLabel
FROM <http://data.bioontology.org/ontologies/CSV_TEST_BRO/submissions/1>
    WHERE {
      ?id a <http://www.w3.org/2002/07/owl#Class> . 
      OPTIONAL {
        ?id ?rewrite0 ?prefLabel .
      }
      FILTER(?id = <http://bioontology.org/ontologies/Activity.owl#Activity> || ?id = <http://bioontology.org/ontologies/Activity.owl#Biospecimen_Management> || ?id = <http://bioontology.org/ontologies/Activity.owl#Community_Engagement> || ...
    }

The second call results this this SPARQL query:

SELECT DISTINCT ?id ?prefLabel ?synonym
FROM <http://data.bioontology.org/ontologies/CSV_TEST_BRO/submissions/1>
    WHERE {
      ?id a <http://www.w3.org/2002/07/owl#Class> .
      OPTIONAL {
        ?id ?rewrite0 ?prefLabel .
      }
      OPTIONAL {
        ?id ?rewrite1 ?synonym .
      }
      FILTER(?id = <http://bioontology.org/ontologies/Activity.owl#Activity> || ?id = <http://bioontology.org/ontologies/Activity.owl#Biospecimen_Management> || ?id = <http://bioontology.org/ontologies/Activity.owl#Community_Engagement> || ...
    }

In 4store, both of these queries return an identical number of rows, with the difference contained only in the selected attributes for each record. In SQL terms, that would equate to an OUTER JOIN query.

Unfortunately, AllegroGraph is treating this as an INNER JOIN query, with the results varying depending on what OPTIONAL attributes are selected. In the second case, it only selects classes that contain both prefLabel(s) and synonym(s). I am not sure which back end is at fault here, but the presence of the construct OPTIONAL tells me that perhaps 4store is doing the right thing.

mdorf commented 4 years ago

Another sample query constructed by 4store when executing the call:

paging = LinkedData::Models::Class.in(sub).include(:prefLabel, :synonym).page(page, size)
SELECT DISTINCT ?id ?prefLabel ?synonym FROM <http://data.bioontology.org/ontologies/CSV_TEST_BRO/submissions/1> WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . 
OPTIONAL { ?id ?rewrite0 ?prefLabel . } 
OPTIONAL { ?id ?rewrite1 ?synonym . } 
FILTER(?id = <http://bioontology.org/ontologies/Activity.owl#Activity> || 
?id = <http://bioontology.org/ontologies/Activity.owl#Biospecimen_Management> || ...) 
FILTER(?rewrite0 = <http://data.bioontology.org/metadata/def/prefLabel> || 
?rewrite0 = <http://www.w3.org/2004/02/skos/core#prefLabel>) 
FILTER(?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> || 
?rewrite1 = <http://purl.obolibrary.org/obo/synonym> || 
?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasBroadSynonym> || 
?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym> || 
?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> || 
?rewrite1 = <http://www.w3.org/2004/02/skos/core#altLabel>) }
mdorf commented 4 years ago

According to Gary King, a developer from AllegroGraph, the above query is constructed incorrectly. Explanation below:

The query looks like:

SELECT DISTINCT ?id ?prefLabel ?synonym 
FROM <http://data.bioontology.org/ontologies/CSV_TEST_BRO/submissions/1> WHERE { 
?id a <http://www.w3.org/2002/07/owl#Class> . 
OPTIONAL { ?id ?rewrite0 ?prefLabel . } 
OPTIONAL { ?id ?rewrite1 ?synonym . } 
FILTER(
?id = <http://bioontology.org/ontologies/Activity.owl#Activity> ||
##
## -- snip --
## 
?id = <http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Biomedical_Supply_Resource>) 
FILTER(?rewrite0 = <http://data.bioontology.org/metadata/def/prefLabel> ||
?rewrite0 = <http://www.w3.org/2004/02/skos/core#prefLabel>) 
FILTER(?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> ||
##
## -- snip --
## 
?rewrite1 = <http://www.w3.org/2004/02/skos/core#altLabel>) 
}

Because the FILTERs are outside the OPTIONALs, they are applied to every row returned. I.e., only rows where ?rewrite0 is in its list and ?rewrite1 is in its list will be returned. I.e., the query will return NO results where ?rewrite0 or ?rewrite1 is NULL.

What you need to do is to make sure that the FILTERS are applied only inside each OPTIONAL. For example, this query will do what you want:

SELECT DISTINCT ?id ?prefLabel ?synonym 
FROM <http://data.bioontology.org/ontologies/CSV_TEST_BRO/submissions/1> WHERE { 
?id a <http://www.w3.org/2002/07/owl#Class> . 
OPTIONAL { 
?id ?rewrite0 ?prefLabel . 
FILTER(?rewrite0 = <http://data.bioontology.org/metadata/def/prefLabel> ||
?rewrite0 = <http://www.w3.org/2004/02/skos/core#prefLabel>) 
} 
OPTIONAL { 
?id ?rewrite1 ?synonym . 
FILTER(?rewrite1 = <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> ||
##
## -- snip --
## 
?rewrite1 = <http://www.w3.org/2004/02/skos/core#altLabel>) 
} 
FILTER(
?id = <http://bioontology.org/ontologies/Activity.owl#Activity> ||
##
## -- snip --
## 
?id = <http://bioontology.org/ontologies/BiomedicalResourceOntology.owl#Biomedical_Supply_Resource>) 
}
mdorf commented 4 years ago

This prompted changes in both Goo and Sparql-client projects. Extensive testing for backward compatibility is required. See: https://github.com/ncbo/goo/commit/8e88ac4bf79a66f1c1cdd66101e2e0070b547342#diff-0ce3c3d4c71d49a8d57dd6864ef8ca4f and https://github.com/ncbo/sparql-client/compare/master...ncbo:allegrograph_testing#diff-372c8098811915fcf8c2ac7020553f8d