ncbo / goo

Graph Oriented Objects (GOO) for Ruby. A RDF/SPARQL based ORM.
http://ncbo.github.io/goo/
Other
15 stars 6 forks source link

SPARQL query yields different results in 4store and AllegroGraph (AG) #135

Closed mdorf closed 3 months ago

mdorf commented 1 year ago

The following SPARQL query is executed on a test data in a GOO test goo/test/test_where/test_aggregated:

SELECT DISTINCT ?id ( COUNT(DISTINCT ?category_agg_count) AS ?category_agg_count_projection )
FROM <http://goo.org/default/Student> 
FROM <http://goo.org/default/Program>
WHERE { ?id a <http://goo.org/default/Student> .
    OPTIONAL { ?id <http://goo.org/default/enrolled> ?enrolled_agg_count .  }        
    OPTIONAL { ?enrolled_agg_count <http://goo.org/default/category> ?category_agg_count .  }    
}
GROUP BY ?id

In 4store, it returns the expected results:

Screen Shot 2023-02-03 at 8 32 05 AM

In AllegroGraph, the results are wrong:

Screen Shot 2023-02-03 at 10 30 08 AM

If I remove the OPTIONAL clauses from the query in AllegroGraph, the query runs correctly:

SELECT DISTINCT ?id ( COUNT(DISTINCT ?category_agg_count) AS ?category_agg_count_projection )
FROM <http://goo.org/default/Student> 
FROM <http://goo.org/default/Program>
WHERE { ?id a <http://goo.org/default/Student> .
    ?id <http://goo.org/default/enrolled> ?enrolled_agg_count .  
    ?enrolled_agg_count <http://goo.org/default/category> ?category_agg_count .
}
GROUP BY ?id
Screen Shot 2023-02-03 at 10 34 07 AM

This issue has not been detected earlier since we haven't enabled the ability to run GOO unit tests with AllegroGraph until recently.

mdorf commented 1 year ago

This issue is causing the test goo/test/test_where/test_aggregated to fail, preventing us from deploying the code to production.

mdorf commented 1 year ago

It's worth noting that this query cannot be easily modified in our code, since it's auto-generated via this GOO API call:

# Categories per student program categories
sts = Student.where.include(:name).aggregate(:count, enrolled: [:category]).all
jonquet commented 1 year ago

Hello. My feeling is that 4store is not right and Allegro is correct in requesting the removal of the OPTIONAL clauses. See : https://jena.apache.org/tutorials/sparql_optionals.html

Especially : "If the first optional binds ?name and ?x to some values, the second OPTIONAL is an attempt to match the ground triples (?x and ?name have values). If the first optional did not match the optional part, then the second one is an attempt to match its triple with two variables."

From what I understand, this is a bit dangerous to play with multiple OPTIONAL like this and we shall certainly monitor the queries like this in our code when moving to AllegroGraph

mdorf commented 1 year ago

The solution to this issue will be rolled into the AllegroGraph v7.4 release. It is not available as a standalone patch.

syphax-bouazzouni commented 9 months ago

The solution to this issue will be rolled into the AllegroGraph v7.4 release. It is not available as a standalone patch.

The test is now running using franzinc/agraph:v8.0.0.rc1