kjetilk / p5-atteanx-query-cache

Experimental prefetching SPARQL query cacher, take 2
0 stars 1 forks source link

Plan with LDF for cartesian isn't generated #24

Closed kjetilk closed 8 years ago

kjetilk commented 8 years ago

There's a final test case that still fails.

I expect this plan:

- Hash Join { s } (cost: 2685)
-   NestedLoop Join (cost: 2480)
-     LDFTriple { ?a, <http://example.org/m/b>, <http://example.org/m/c> } (cost: 62)
-     Hash Join { s } (cost: 4)
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>} (cost: 2)
-       Table (?s, ?o)
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>} (cost: 2)
-   SPARQLBGP (cost: 205)
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-     Quad { ?s, <http://example.org/m/q>, <http://example.org/m/a>, <http://test.invalid/graph> }

but I get this plan:

- Hash Join { s } (cost: 40245)
-   NestedLoop Join (cost: 40040)
-     SPARQLBGP (cost: 1001)
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     Hash Join { s } (cost: 4)
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>} (cost: 2)
-       Table (?s, ?o)
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>} (cost: 2)
-   SPARQLBGP (cost: 205)
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-     Quad { ?s, <http://example.org/m/q>, <http://example.org/m/a>, <http://test.invalid/graph> }

I have done various things recently that could influence this: Recently, I have implemented penalizing plans where there are more than one BGP in the plan, penalize single-quad BGPs, as they should always be replaced by an LDF, penalize plans that have more than one LDFs (since they are better done with remote joins, I assume), and penalize plans if a BGP has a common variable with an LDF for the same reason.

So, basically, it should have been pretty straightforward: This triple has no common variable with anything, so it stands alone, and it is not in the cache, so it should be an LDF. I see it estimated to a cost of 62 elsewhere, which is far less than the 1001 I assign to a BGP when it has a single quad.

The first thing I checked was whether both the SPARQLBGP plan and the LDFTriple plan was generated. They are both implemented by an "around access_plan". That appears good to me.

Then, I looked through the trace, and it appears that the expected plan is actually never generated, so it appears it is not the cost model that is at fault.

CC @kasei

kjetilk commented 8 years ago

These two subplans are generated:

- NestedLoop Join
-   Hash Join { s }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   LDFTriple { ?a, <http://example.org/m/b>, <http://example.org/m/c> }

- NestedLoop Join
-   LDFTriple { ?a, <http://example.org/m/b>, <http://example.org/m/c> }
-   Hash Join { s }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
kjetilk commented 8 years ago

So, there, the LDFTriple { ?a ... is in near the root, but then, the SPARQLBGP Quad { ?s becomes a grandchild, and when another SPARQLBGP Quad { ?s it must be rotated to be coalesced, and then, that can't happen...

kjetilk commented 8 years ago

Therefore, we end up with:

- Hash Join { s }
-   NestedLoop Join
-     LDFTriple { ?a, <http://example.org/m/b>, <http://example.org/m/c> }
-     Hash Join { s }
-       Hash Join { s }
-         Table (?s)
-           {s=<http://example.org/foo>}
-           {s=<http://example.org/bar>}
-         Table (?s, ?o)
-           {o=<http://example.org/bar>, s=<http://example.org/foo>}
-           {o=<http://example.org/baz>, s=<http://example.com/foo>}
-           {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       SPARQLBGP
-         Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, <http://example.org/m/a>, <http://test.invalid/graph> }
kjetilk commented 8 years ago

Now, to the behaviour when there's a SPARQLBGP. We first get the completely analogous plans:

- NestedLoop Join
-   Hash Join { s }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   SPARQLBGP
-     Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }

- NestedLoop Join
-   SPARQLBGP
-     Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-   Hash Join { s }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
kjetilk commented 8 years ago

But after rotation, we also get these plans:


- Hash Join { s }
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }

- Hash Join { s }
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}

- NestedLoop Join
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }

- NestedLoop Join
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}

- Hash Join { s }
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

- Hash Join { s }
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}

- NestedLoop Join
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

- NestedLoop Join
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   Hash Join { s }
-     Table (?s)
-       {s=<http://example.org/foo>}
-       {s=<http://example.org/bar>}
-     Table (?s, ?o)
-       {o=<http://example.org/baz>, s=<http://example.com/foo>}
-       {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-       {o=<http://example.org/bar>, s=<http://example.org/foo>}

- NestedLoop Join
-   Hash Join { s }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   SPARQLBGP
-     Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }

- Hash Join { s }
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

- Hash Join { s }
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}

- NestedLoop Join
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

- NestedLoop Join
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   NestedLoop Join
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}

- Hash Join { s }
-   NestedLoop Join
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

- Hash Join { s }
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   NestedLoop Join
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }

- NestedLoop Join
-   NestedLoop Join
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

- NestedLoop Join
-   SPARQLBGP
-     Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }
-   NestedLoop Join
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }

- NestedLoop Join
-   SPARQLBGP
-     Quad { ?a, <http://example.org/m/b>, <http://example.org/m/c>, <http://test.invalid/graph> }
-   Hash Join { s }
-     Hash Join { s }
-       Table (?s)
-         {s=<http://example.org/foo>}
-         {s=<http://example.org/bar>}
-       Table (?s, ?o)
-         {o=<http://example.org/baz>, s=<http://example.com/foo>}
-         {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-         {o=<http://example.org/bar>, s=<http://example.org/foo>}
-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

which we don't get for the LDFTriple, it seems. In the LDFTriple case, we only get those two unrotated plans

kjetilk commented 8 years ago

and therefore, when the next subplan comes:

- SPARQLBGP
-   Quad { ?s, <http://example.org/m/q>, <http://example.org/m/a>, <http://test.invalid/graph> }

it has to be coalesced with the other SPARQLBGP it shares a variable ?s with. It can happen in when there's are only SPARQLBGPs, since there other

-     SPARQLBGP
-       Quad { ?s, <http://example.org/m/q>, ?.blank-0, <http://test.invalid/graph> }

can be found as a grandchild of the root, but not in the LDFTriple case, because there, that plan is a great-grandchild, and the rotation doesn't look that deep...

kjetilk commented 8 years ago

Non-optimal solution with a bit of code duplication, but all tests pass!