ruby-rdf / sparql

Ruby SPARQL library
http://rubygems.org/gems/sparql
The Unlicense
89 stars 14 forks source link

multiple aggregates return wrong value #46

Closed white-gecko closed 1 year ago

white-gecko commented 1 year ago

When I execute a query with two aggregates the bound values are taken from the first.

If in the example MIN is replaced by SAMPLE in all places you get the same result.

require 'sparql'
require 'rdf/turtle'

data = %(
PREFIX ex: <http://example.org/>

ex:e1 ex:a 1 ;
      ex:b 10 .

ex:e2 ex:a 1 ;
      ex:b 20 .

ex:e3 ex:a 2 ;
      ex:b 20 .
)

query_string = %(
PREFIX ex: <http://example.org/>

SELECT ?ev (MIN(?a) as ?a_min) (MIN(?b) as ?b_min)
WHERE {
  ?ev ex:a ?a ;
      ex:b ?b .
}
GROUP BY ?ev
)

queryable = RDF::Graph.new do |graph|
  RDF::Turtle::Reader.new(data) {|reader| graph << reader}
end

query = SPARQL.parse(query_string)
queryable.query(query) do |result|
  puts result.inspect
end

the result is:

$ ruby mwe.rb 
#<RDF::Query::Solution:0x7d0({:ev=>#<RDF::URI:0x64 URI:http://example.org/e1>, :a_min=>#<RDF::Literal::Integer:0x7e4("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>, :b_min=>#<RDF::Literal::Integer:0x7e4("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>})>
#<RDF::Query::Solution:0x80c({:ev=>#<RDF::URI:0x4b0 URI:http://example.org/e2>, :a_min=>#<RDF::Literal::Integer:0x820("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>, :b_min=>#<RDF::Literal::Integer:0x820("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>})>
#<RDF::Query::Solution:0x834({:ev=>#<RDF::URI:0x4c4 URI:http://example.org/e3>, :a_min=>#<RDF::Literal::Integer:0x848("2"^^<http://www.w3.org/2001/XMLSchema#integer>)>, :b_min=>#<RDF::Literal::Integer:0x848("2"^^<http://www.w3.org/2001/XMLSchema#integer>)>})>
gkellogg commented 1 year ago

Turning this into the algebra results in the following:

(prefix ((ex: <http://example.org/>))                                                  
 (project (?ev ?a_min ?b_min)                                                           
  (extend ((?a_min ??.0) (?b_min ??.0))                                                
   (group (?ev)
    ((??.0 (min ?a)))
    (bgp
      (triple ?ev ex:a ?a)
      (triple ?ev ex:b ?b))))))

Meanwhile, JENA produces much the same algebra (from sparql.org):

(base <http://example/base/>
 (prefix ((ex: <http://example.org/>))
   (project (?ev ?a_min ?b_min)
     (extend ((?b_min ?.1))
       (extend ((?a_min ?.0))
         (group (?ev) ((?.0 (min ?a)) (?.1 (min ?b)))
           (bgp
             (triple ?ev ex:a ?a)
             (triple ?ev ex:b ?b)
           )))))))

If you parse that using SPARQL::Algebra.parse and execute that query, you do indeed get different results:

#<RDF::Query::Solution:0x1a5b30({:ev=>#<RDF::URI:0x807dc URI:http://example.org/e1>, :a_min=>#<RDF::Literal::Integer:0x823c0("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>, :b_min=>#<RDF::Literal::Integer:0x823e8("10"^^<http://www.w3.org/2001/XMLSchema#integer>)>})>
#<RDF::Query::Solution:0x1a5b44({:ev=>#<RDF::URI:0x80818 URI:http://example.org/e2>, :a_min=>#<RDF::Literal::Integer:0x82410("1"^^<http://www.w3.org/2001/XMLSchema#integer>)>, :b_min=>#<RDF::Literal::Integer:0x82424("20"^^<http://www.w3.org/2001/XMLSchema#integer>)>})>
#<RDF::Query::Solution:0x1a5b58({:ev=>#<RDF::URI:0x8082c URI:http://example.org/e3>, :a_min=>#<RDF::Literal::Integer:0x8244c("2"^^<http://www.w3.org/2001/XMLSchema#integer>)>, :b_min=>#<RDF::Literal::Integer:0x82460("20"^^<http://www.w3.org/2001/XMLSchema#integer>)>})>

It seems that the code which is consolidating the extend operators is changing the semantics of the query.

Aside: for a slightly easier to understand result, try the following:

results = queryable.query(query)
 SXP:Generator.print(results.to_sxp_bin)

This gives you the following:

(
 ((ev <http://example.org/e1>) (a_min 1) (b_min 10))                         
 ((ev <http://example.org/e2>) (a_min 1) (b_min 20))                         
 ((ev <http://example.org/e3>) (a_min 2) (b_min 20)))