Evaluate entity queries via a single scan

Currently, entity queries, e.g,. 

?x knows ?y .
?x name "x" .
?x age "18" .

are evaluated via joins along their subject (in the above example: ?x). That 
is, one would need to compute bindings for each triple patten, and join them 
using two equi-joins. 

However, this (probably) could be done much more efficiently with a single 
scan. That is, one would start with a scan of the pattern with the least 
matches (e.g., ?x age "18"):

x1 age "18" --> scan for x1 ?p ?o
x2 age "18" --> scan for x2 ?p ?o
x3 age "18" --> scan for x3 ?p ?o
...

Each such scan (x1 ?p ?o) would result in additional property/object pairs - 
these could be pushed to subsequent triple pattern accesses. For instance,  

"x1 ?p ?o" could find "x1 knows y1", "x1 knows y2", "x1 name "x"" ... The 
former two triples could be pushed to access ?x knows ?y, the latter triple 
("x1 name "x") to pattern access for ?x name "x".

The key advantage is really that scans (sorted accesses) are fairly cheap, in 
comparison to random access probes. Thus, when finding the first potential 
result entity (e.g, x1), we could just scan over (all) its associated triples 
...

- Andreas

Original issue reported on code.google.com by andreas.josef.wagner on 29 Jan 2014 at 9:33

mgaare / cumulusrdf

Evaluate entity queries via a single scan #27