The count query will trigger heavy looping

spaziocodice / SolRDF

An RDF plugin for Solr

Apache License 2.0

114 stars 20 forks source link

The count query will trigger heavy looping #89

Closed akkabin closed 9 years ago

akkabin commented 9 years ago

The following count query will get the no endless looping. SELECT (COUNT(*) as ?count) WHERE {?s ?p ?o .}

agazzarini commented 9 years ago

Hi @brianchen2012 I have few questions, as the test suite already contains a test [1] that successfully executes the following query:

SELECT ?p (COUNT(?p) AS ?pTotal)
WHERE
{ ?s ?p ?o . }
GROUP BY ?p

which is very similar to your query.

Are you running SolRDF in standalone or Cloud mode
What is the size of the underlying dataset? In other words: what is the expected result of the query?

Thanks for entering this Andrea

[1] org.gazzax.labs.solrdf.integration.sparql.LearningSparql_SELECT_ITCase.countFunction()

agazzarini commented 9 years ago

@brianchen2012 please forget my questions: I just reproduced it

agazzarini commented 9 years ago

A short update on this: that is not an endless loop. I know, it seems but it isn't.

The COUNT keyword evaluation triggers a scan across any entry in the target model defined by the WHERE condition (in the example on the whole dataset), so for instance if you have 1.000.000 of triples the query above does 1.000.000 of iterations...taking a lot of time...and this is the problem.

akkabin commented 9 years ago

Yes, it will iterate through the result set

agazzarini commented 9 years ago

@brianchen2012 I'm still fighting with this issue, it seems that a deeper look into Jena internals is needed, so I think it will take me a bit

agazzarini commented 9 years ago

Closed as it is not a bug, I opened the issue #96 for the general optimization topic