ncbo / goo

Graph Oriented Objects (GOO) for Ruby. A RDF/SPARQL based ORM.
http://ncbo.github.io/goo/
Other
15 stars 6 forks source link

SPARQL queries that return Integer 0 fail in AllegroGraph #104

Closed mdorf closed 4 years ago

mdorf commented 4 years ago

GROUP BY COUNT (or any other Integer result) queries that yield 0 rows fail in AllegroGraph

mdorf commented 4 years ago

Integer 0 results return format from each data store:

AllegroGraph: 
{"head":{"vars":["g", "c"]},"results":{"bindings":[ 
  {"c":{"type":"literal","datatype":"http://www.w3.org/2001/XMLSchema#integer", "value":"0"}}
]}}

4store: 
{"head":{"vars":["g","c"]},"results": {"bindings":[]}}
mdorf commented 4 years ago

Implemented a somewhat ugly fix that intercepts the results and sets the bindings to [] if the AllegroGraph format is encountered. May need to revisit this at a later time.

mdorf commented 4 years ago

The current fix is to handle this case at sparql-client layer. In client.rb/parse_response, we compare the raw JSON string with the known-case string and set result_data to [] if there is a match. The code snippet with this solution as well possible alternative implementations included below:

if response.body == "{\"head\":{\"vars\":[\"g\", \"c\"]},\"results\":{\"bindings\":[\n {\"c\":{\"type\":\"literal\",\"datatype\":\"http://www.w3.org/2001/XMLSchema#integer\", \"value\":\"0\"}}]}}"
# if result_data.length == 1 && !result_data[0][:c].nil? && result_data[0][:c].is_a?(RDF::Literal::Integer) && result_data[0][:c].value == "0"
# if response.body.to_s.include? "\"type\":\"literal\",\"datatype\":\"http://www.w3.org/2001/XMLSchema#integer\", \"value\":\"0\""
    result_data = []
end
mdorf commented 4 years ago

May revisit this solution at a later time.

mdorf commented 4 years ago

It appears that in the cases of Integer 0, the AllegroGraph results bindings do not match the variables. The variable "g" is missing in the result:

4store:

{
  "head": {
    "vars": ["g", "c"]
  },
  "results": {
    "bindings": [
      {
        "g": {"type":"uri","value":"http://data.bioontology.org/ontologies/MAPPING_TEST2/submissions/22"},
        "c":{"type":"literal","value":"3","datatype":"http://www.w3.org/2001/XMLSchema#integer"}
      },
      {
        "g": {"type":"uri","value":"http://data.bioontology.org/ontologies/MAPPING_TEST4/submissions/44"},
        "c": {"type":"literal","value":"2","datatype":"http://www.w3.org/2001/XMLSchema#integer"}
      }
    ]
  }
}

{
  "head": {
    "vars": ["g", "c"]
  },
  "results": {
    "bindings": []
  }
}

AG:

{
  "head": {
    "vars": ["g", "c"]
  },
  "results": {
    "bindings": [
      {
        "g": {"type":"uri","value":"http://data.bioontology.org/ontologies/MAPPING_TEST2/submissions/22"},
        "c":{"type":"literal","datatype":"http://www.w3.org/2001/XMLSchema#integer", "value":"3"}
      },
      {
        "g": {"type":"uri","value":"http://data.bioontology.org/ontologies/MAPPING_TEST4/submissions/44"},
        "c":{"type":"literal","datatype":"http://www.w3.org/2001/XMLSchema#integer", "value":"2"}
      }
    ]
  }
}

{
  "head": {
    "vars": ["g", "c"]
  },
  "results": {
    "bindings": [
      {
        "c": {"type":"literal","datatype":"http://www.w3.org/2001/XMLSchema#integer", "value":"0"}
      }
    ]
  }
}
mdorf commented 4 years ago

This appears to be a general difference between 4store and AllegroGraph. 4store ALWAYS matches the variables "vars" with bindings "bindings", even if the binding for a given var is empty. For example:

4store:

{
  "head": {
    "vars": ["id", "classType"]
  },
  "results": {
    "bindings":[
      {
        "id": {"type":"uri","value":"http://data.bioontology.org/ontologies/MAPPING_TEST4/submissions/44"},
        "classType": {}
      }
    ]
  }
}

AllegroGraph, on the other hand, does not match vars with bindings one-to-one. If the binding is empty (or NULL), it does not exist in the "bindings" array:

AG:

{
  "head": {
    "vars": ["id", "classType"]
  },
  "results": {
    "bindings": [
      {
        "id": {"type":"uri","value":"http://data.bioontology.org/ontologies/MAPPING_TEST2/submissions/22"}
      }
    ]
  }
}

This is problematic because one has to always provide additional checks on whether the binding exists, even though there is a corresponding variable for it.

mdorf commented 4 years ago

There appear to be a number of queries during the run of the Mappings test suite that return a single result in AllegroGraph that consists of the NULL value for the Graph and a "0" value for the corresponding COUNT. The equivalent queries in 4store return NO RESULTS. Here is a screenshot of the AllegroGraph response:

Screen Shot 2020-04-30 at 6 52 34 PM

This produces the following response:

{
  "head": {
    "vars": ["g", "c"]
  },
  "results": {
    "bindings": [
      {
        "c": {"type":"literal","datatype":"http://www.w3.org/2001/XMLSchema#integer", "value":"0"}
      }
    ]
  }
}

The equivalent query in 4store returns NO RESULTS and the following response:

{
  "head": {
    "vars": ["g", "c"]
  },
  "results": {
    "bindings": []
  }
}
mdorf commented 4 years ago

The queries that behave differently in AllegroGraph vs 4store:

SELECT ?g (count(?s1) as ?c)
WHERE {
  {
    GRAPH <http://data.bioontology.org/ontologies/MAPPING_TEST4/submissions/44> {
      ?s1 <http://bioportal.bioontology.org/ontologies/umls/cui> ?o .
    }
    GRAPH ?g {
      ?s2 <http://bioportal.bioontology.org/ontologies/umls/cui> ?o .
    }
  }
  FILTER (?s1 != ?s2)
  FILTER (!STRSTARTS(str(?g),'http://data.bioontology.org/ontologies/MAPPING_TEST4'))
} GROUP BY ?g
SELECT ?g (count(?s1) as ?c)
WHERE {
  {
    GRAPH <http://data.bioontology.org/ontologies/MAPPING_TEST4/submissions/44> {
        ?s1 <http://data.bioontology.org/metadata/def/mappingSameURI> ?o .
    }
    GRAPH ?g {
        ?s2 <http://data.bioontology.org/metadata/def/mappingSameURI> ?o .
    }
  }
  FILTER (!STRSTARTS(str(?g),'http://data.bioontology.org/ontologies/MAPPING_TEST4'))
} GROUP BY ?g
SELECT ?g (count(?s1) as ?c)
WHERE {
  {
    GRAPH <http://data.bioontology.org/ontologies/MAPPING_TEST4/submissions/44> {
      ?s1 <http://data.bioontology.org/metadata/def/mappingLoom> ?o .
    }
    GRAPH ?g {
      ?s2 <http://data.bioontology.org/metadata/def/mappingLoom> ?o .
    }
  }
  FILTER (?s1 != ?s2)
  FILTER (!STRSTARTS(str(?g),'http://data.bioontology.org/ontologies/MAPPING_TEST4'))
} GROUP BY ?g
SELECT ?g (count(?s1) as ?c)
WHERE {
  {
    GRAPH <http://data.bioontology.org/ontologies/MAPPING_TEST4/submissions/44> {
      ?s1 <http://data.bioontology.org/metadata/def/mappingRest> ?o .
    }
    GRAPH ?g {
      ?s2 <http://data.bioontology.org/metadata/def/mappingRest> ?o .
    }
  }
  FILTER (?s1 != ?s2)
  FILTER (!STRSTARTS(str(?g),'http://data.bioontology.org/ontologies/MAPPING_TEST4'))
} GROUP BY ?g
mdorf commented 4 years ago

The issue is still valid in the AllegroGraph v7.0.0

Screen Shot 2020-05-04 at 4 18 36 PM
mdorf commented 4 years ago

Further observed that the same single result is returned by the AllegroGraph even when a bogus graph names are used:

Screen Shot 2020-05-04 at 4 40 26 PM
mdorf commented 4 years ago

Here is a response from AllegroGraph support on this:

Regarding issue 2, the query can be simplified to:

select ?s (count(?o) AS ?count) where { ?s ?o } group by ?s

and if there are no matching triples, the result in AllegroGraph is one row with ?s unbound and ?count 0, whereas you might expect zero rows.

It turns out the SPARQL 1.1 standard is ambiguous, making either of them acceptable, and it is up to the implementation. However I found that the consensus now is that the desired output is zero rows, similar to SQL. For example Andy Seaborne, who edited the SPARQL 1.1 specification, has written about exactly this: https://afs.github.io/sparql-agg-group-empty.html#group-agg-no-rows

We aim to follow the SPARQL 1.1 standard in text and spirit, so we'll follow the consensus and modify AllegroGraph's behaviour.

Thanks for bringing this to our attention.

mdorf commented 4 years ago

Another response from AllegroGraph support:

We plan to do this in the next two weeks. Meanwhile you could probably rewrite the query by introducing outer select and filter, like:

select * { { select ?s (count(?o) as ?c) { ?s ?o } group by ?s } filter (bound(?s)) }

mdorf commented 4 years ago

This appears to be fixed in AllegroGraph v7.0.1

mdorf commented 4 years ago

The code that handles this case has been removed.