openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
863 stars 210 forks source link

SPARQL 1.1 VALUES clause takes very long #28

Open joernhees opened 11 years ago

joernhees commented 11 years ago

It seems as if the values clause in virtuoso isn't really working yet. The following query takes forever to complete on dbpedia 1:

SELECT *
WHERE {
  ?uri rdfs:label ?label.
  VALUES (?uri) { (dbpedia:Berlin) (dbpedia:Kaiserslautern) }
}

If you run the same query with a single entity only it works 2:

SELECT *
WHERE {
  ?uri rdfs:label ?label.
  VALUES (?uri) { (dbpedia:Berlin) }
}

A workaround is constructing a filter clause 3 which works fast as usual:

SELECT *
WHERE {
  ?uri rdfs:label ?label.
  FILTER (?uri=dbpedia:Berlin || ?uri=dbpedia:Kaiserslautern)
}
joernhees commented 11 years ago

Also note that very restrictive queries seem to cause huge execution time estimates:

select * where { ?s ?p ?o. VALUES (?s) { (dbpedia:Berlin) (dbpedia:Kaiserslautern) } VALUES (?p) { (rdfs:label) (skos:refLabel) } }
Virtuoso 42000 Error The estimated execution time 82966 (sec) exceeds the limit of 3000 (sec).
openlink commented 11 years ago

We will be looking into this shortly.

bcoughlan commented 11 years ago

As a workaround, putting the VALUES statement at the top of the query body seems to execute much faster.

barcou@deri$ echo "SELECT * WHERE { VALUES (?uri) { (dbpedia:Berlin) (dbpedia:Kaiserslautern) } . ?uri rdfs:label ?label. }" > /tmp/b
barcou@deri$ time curl "http://dbpedia.org/sparql" --form "query=@/tmp/b"
<snip>
real    0m1.477s
<snip>

barcou@deri$ echo "SELECT * WHERE { ?uri rdfs:label ?label. VALUES (?uri) { (dbpedia:Berlin) (dbpedia:Kaiserslautern) } }" > /tmp/a
barcou@deri$ time curl "http://dbpedia.org/sparql" --form "query=@/tmp/a"
... 504 gateway timeout after 10 mins
joernhees commented 11 years ago

@bcoughlan cool, thanks for the hint

bcoughlan commented 11 years ago

Some more investigation... with SPARQL DELETE, VALUES is also extremely slow and the workaround I mentioned above does not work.

Example:

Time: 15.08 seconds

DELETE { 
    GRAPH <http://example.com/> { ?node ?p ?o . }
} WHERE {
    GRAPH <http://example.com/> {
        VALUES ?node { :a :b :c :d :e :f :g :h :i }
        ?node ?p ?o .
    }
}

Running them as separate queries (i.e. run with :a ?p ?o :b ?p ?o and so on) takes a total 0.082 seconds on my machine. A huge difference that only gets worse with more VALUES.

Another observation: This seems to depend on the amount of data already in the store, there is no difference between the two when the nodes the store has little or no data in it.