openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
852 stars 211 forks source link

Duplicate tuples with inner SELECT #388

Open knoan opened 9 years ago

knoan commented 9 years ago

Many tuples returned by the following query on the http://dbpedia.org/sparql endpoint (Virtuoso 7.20) seem to be duplicated.

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select

  ?link ?label ?count

where {

  { select ?link (count(?term) as ?count) {

    <http://dbpedia.org/resource/Fiat_500> ?link ?term

  } group by ?link having (count(?term) > 0 ) }

  ?link rdfs:label ?label filter ( lang(?label) ='en' ) 

}
HughWilliams commented 9 years ago

I don't see any duplicates when running the query against DBpedia:

http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&qtxt=prefix+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0A%0D%0Aselect%0D%0A%0D%0A++%3Flink++%3Flabel+%3Fcount%0D%0A%0D%0Awhere+%7B%0D%0A%0D%0A++%7B+select+%3Flink+%28count%28%3Fterm%29+as+%3Fcount%29+%7B%0D%0A%0D%0A++++%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FFiat_500%3E+%3Flink+%3Fterm%0D%0A%0D%0A++%7D+group+by+%3Flink+having+%28count%28%3Fterm%29+%3E+0+%29+%7D%0D%0A%0D%0A++%3Flink+rdfs%3Alabel+%3Flabel+filter+%28+lang%28%3Flabel%29+%3D%27en%27+%29+%0D%0A%0D%0A%7D&format=text%2Fhtml&timeout=30000&debug=on

knoan commented 9 years ago

Puzzling one…

The linked query @yasgui returns duplicated matches as seen in the attached screenshot; the same query on the query editor @DBpedia seems to work correctly. I already noticed this odd behaviour other times: in what execution paths from Virtuoso query editor and third-party tools could possible differ?

The issue is confirmed by running the query manually over HTTP using curl as:

curl --data-urlencode "query=prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select

  ?link ?label ?count

where {

  { select ?link (count(?term) as ?count) {

    <http://dbpedia.org/resource/Fiat_500> ?link ?term

  } group by ?link having (count(?term) > 0 ) }

  ?link rdfs:label ?label filter ( lang(?label) ='en' ) 

}" http://dbpedia.org/sparql 

screenshot

HughWilliams commented 9 years ago

@knoan: I see what u mean as the curl command returns the duplicates as well , but if you add a "distinct" in the select then the duplicates are removed ...

Is yasgui using curl to perform its queries against remote endpoints ?

knoan commented 9 years ago

As far as I can reckon from network activity, it performs query directly from the browser using XHR objects.

HughWilliams commented 9 years ago

@knoan: Ok, we are going to check this behaviour ...