openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
857 stars 210 forks source link

DISTINCT changes order of results #680

Open RicardoUsbeck opened 7 years ago

RicardoUsbeck commented 7 years ago

Hi,

for the query below, the DISTINCT operator (try to leave that out) changes the result set, although the answer semantic should be the same:

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?uri
WHERE {
        ?x rdf:type dbo:Album .
        ?x dbo:artist res:Elvis_Presley .
        ?x dbo:releaseDate ?y .
        ?x dbo:recordLabel ?uri .
}
ORDER BY ASC(?y) 
LIMIT 1

Obviously the DISTINCT operator is applied before the LIMIT operator but it changes the order. I am not sure whether this is a bug or a feature.

kidehen commented 7 years ago

On 10/2/17 2:32 AM, Ricardo Usbeck wrote:

Hi,

for the query below, the DISTINCT operator (try to leave that out) changes the result set, although the answer semantic should be the same:

|PREFIX dbo: http://dbpedia.org/ontology/ PREFIX res: http://dbpedia.org/resource/ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# SELECT DISTINCT ?uri WHERE { ?x rdf:type dbo:Album . ?x dbo:artist res:Elvis_Presley . ?x dbo:releaseDate ?y . ?x dbo:recordLabel ?uri . } ORDER BY ASC(?y) LIMIT 1 |

Obviously the DISTINCT operator is applied before the LIMIT operator but it changes the order. I am not sure whether this is a bug or a feature.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openlink/virtuoso-opensource/issues/680, or mute the thread https://github.com/notifications/unsubscribe-auth/ABeGcQB0H-5G4pkcmj48Ks3LG89JJb_Wks5soIOagaJpZM4PqPNX.

Virtuoso is a Quad Store so please repeat using:

|PREFIX dbo: http://dbpedia.org/ontology/ PREFIX res: http://dbpedia.org/resource/ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# SELECT DISTINCT ?uri ?g WHERE { GRAPH ?g { ?x rdf:type dbo:Album . ?x dbo:artist res:Elvis_Presley . ?x dbo:releaseDate ?y . ?x dbo:recordLabel ?uri . } } ORDER BY ASC(?y) LIMIT 1|

-- Regards,

Kingsley Idehen
Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com)

Weblogs (Blogs): Legacy Blog: http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog: http://kidehen.blogspot.com Medium Blog: https://medium.com/@kidehen

Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

RicardoUsbeck commented 6 years ago

Still, if I use this query with DISTINCT the result is http://dbpedia.org/resource/Pickwick_Records If I use it without DISTINCT the result is http://dbpedia.org/resource/RCA_Victor

Where RCA Victor is semantically correct

JervenBolleman commented 6 years ago
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?uri
WHERE {
        ?x rdf:type dbo:Album .
        ?x dbo:artist res:Elvis_Presley .
        ?x dbo:releaseDate ?y .
        ?x dbo:recordLabel ?uri .
}
GROUP BY ?y ORDER BY ASC(?y) 
LIMIT 1

Will give you the results you expect. It does seem that in this case the query engine does not preserve the fact that it must first order, then project variables and only then remove duplicates as per standard.