openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
858 stars 210 forks source link

SPARQL: Using a VALUE clause with > 1 values speeds query up #400

Open joernhees opened 9 years ago

joernhees commented 9 years ago

If i run this query on a local virtuoso 7.2.0 endpoint it takes ~ 20 seconds:

SELECT ?source count(?target) WHERE {
 VALUES (?source) { (dbpedia:Alchemy) }
 ?source <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Arabic_language> .
 ?vcb0 <http://dbpedia.org/ontology/wikiPageWikiLink> ?source .
 ?vcb0 <http://dbpedia.org/ontology/wikiPageWikiLink> ?target .
 <http://dbpedia.org/resource/The_Upside_Down_Show> <http://dbpedia.org/ontology/wikiPageWikiLink> ?target .
}

Running this query takes ~ .1 seconds:

SELECT ?source count(?target) WHERE {
 VALUES (?source) { (dbpedia:Alchemy) (dbpedia:ThisJustMakesTheQueryFaster) }
 ?source <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Arabic_language> .
 ?vcb0 <http://dbpedia.org/ontology/wikiPageWikiLink> ?source .
 ?vcb0 <http://dbpedia.org/ontology/wikiPageWikiLink> ?target .
 <http://dbpedia.org/resource/The_Upside_Down_Show> <http://dbpedia.org/ontology/wikiPageWikiLink> ?target .
}

It seems that the same query without VALUES is slow as well ~ 20 seconds:

SELECT ?source count(?target) WHERE {
 dbpedia:Alchemy <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Arabic_language> .
 ?vcb0 <http://dbpedia.org/ontology/wikiPageWikiLink> dbpedia:Alchemy .
 ?vcb0 <http://dbpedia.org/ontology/wikiPageWikiLink> ?target .
 <http://dbpedia.org/resource/The_Upside_Down_Show> <http://dbpedia.org/ontology/wikiPageWikiLink> ?target .
}

(i don't think that this is a caching problem, as i ran many queries like this in different orders with the same results)

kidehen commented 9 years ago

On 5/6/15 10:58 AM, Jörn Hees wrote:

If i run this query on a local virtuoso 7.2.0 endpoint it takes ~ 20 seconds:

SELECT ?source count(?target)WHERE { VALUES (?source) { (dbpedia:Alchemy) } ?source http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Arabic_language . ?vcb0 http://dbpedia.org/ontology/wikiPageWikiLink ?source . ?vcb0 http://dbpedia.org/ontology/wikiPageWikiLink ?target . http://dbpedia.org/resource/The_Upside_Down_Show http://dbpedia.org/ontology/wikiPageWikiLink ?target . }

Running this query takes ~ .1 seconds:

SELECT ?source count(?target)WHERE { VALUES (?source) { (dbpedia:Alchemy) (dbpedia:ThisJustMakesTheQueryFaster) } ?source http://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Arabic_language . ?vcb0 http://dbpedia.org/ontology/wikiPageWikiLink ?source . ?vcb0 http://dbpedia.org/ontology/wikiPageWikiLink ?target . http://dbpedia.org/resource/The_Upside_Down_Show http://dbpedia.org/ontology/wikiPageWikiLink ?target . }

It seems that the same query without VALUES is slow as well ~ 20 seconds:

SELECT ?source count(?target)WHERE { dbpedia:Alchemyhttp://dbpedia.org/ontology/wikiPageWikiLink http://dbpedia.org/resource/Arabic_language . ?vcb0 http://dbpedia.org/ontology/wikiPageWikiLink dbpedia:Alchemy . ?vcb0 http://dbpedia.org/ontology/wikiPageWikiLink ?target . http://dbpedia.org/resource/The_Upside_Down_Show http://dbpedia.org/ontology/wikiPageWikiLink ?target . }

(i don't think that this is a caching problem, as i ran many queries like this in different orders with the same results)

— Reply to this email directly or view it on GitHub https://github.com/openlink/virtuoso-opensource/issues/400.

It's best you accompany these reports with profile information, if possible. It simply speeds up resolution etc..

Regards,

Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

joernhees commented 9 years ago

https://gist.github.com/joernhees/4bc048f38732516c54da

following this: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksAanalyzingSPARQLQuery

joernhees commented 9 years ago

reading the SQL translation i think the problem is related to the following join:

  FROM DB.DBA.RDF_QUAD AS "s_1_8_t0"
    INNER JOIN DB.DBA.RDF_QUAD AS "s_1_8_t1"
    ON (1)
HughWilliams commented 9 years ago

@joernhees: Setting the "Enable_joins_only = 1" flag param in the INI file may help here, as it will cause the query optimizer to only consider next plan candidates that are connected by a join to the existing partial plan. In other words, no Cartesian products will be considered. Thus you would need to add the following entries in the INI file and restart Virtuoso:

[Flags] Enable_joins_only = 1

See, http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtQueryOptDiagnostic

Should the problem persist please provide the database statistic stat.dv export as indicated in the Wiki doc ...

joernhees commented 9 years ago

is that setting safe for all kinds of queries? it sounds like it could cause problems for queries where it needs a cartesian product...

HughWilliams commented 9 years ago

@joernhees: The cartesian products would be avoided were possible, but it that is the only option it would be used ...