kasei / attean

A Perl Semantic Web Framework
19 stars 10 forks source link

Undefined costs when using IDPQueryPlanner #74

Closed kjetilk closed 8 years ago

kjetilk commented 8 years ago

I'm seeing some problems when using Attean::IDPQueryPlanner. I have AtteanX::QueryPlanner::Cache, which extends it, but does not implement cost_for_plan. Instead, cost_for_plan is implemented by AtteanX::Model::SPARQLCache. That calls the planner's cost_for_plan, though in several places, for example in this line. However, that returns undef, occasionally, but only for the more complex plans. In there, I inserted a die:

my $rcost = $planner->cost_for_plan($children[1], $self);
die $children[1]->as_string unless defined($rcost);
$cost = ($lcost + $rcost);

and it returned:

- Hash Join { s } (cost: 4)
-   Table (?s, ?o)
-     {o=<http://example.org/baz>, s=<http://example.com/foo>}
-     {o=<http://example.org/foobar>, s=<http://example.com/foo>}
-     {o=<http://example.org/bar>, s=<http://example.org/foo>} (cost: 2)
-   Table (?s)
-     {s=<http://example.org/foo>}
-     {s=<http://example.org/bar>} (cost: 2)

so, it seems it should just return a cached cost...

I have so far managed to reduce it to this:

use v5.14;
use autodie;
use utf8;
use Test::Modern;

use CHI;

use Attean;
use Attean::RDF;
use AtteanX::QueryPlanner::Cache;
#use Carp::Always;
use Data::Dumper;
use AtteanX::Store::SPARQL;
use AtteanX::Model::SPARQLCache;
use Log::Any::Adapter;
Log::Any::Adapter->set($ENV{LOG_ADAPTER} || 'Stderr') if ($ENV{TEST_VERBOSE});

my $cache = CHI->new( driver => 'Memory', global => 1 );

my $p   = AtteanX::QueryPlanner::Cache->new;

# These tests does not actually look up anything in a real store, it just simulates
my $store   = Attean->get_store('SPARQL')->new('endpoint_url' => iri('http://test.invalid/'));
my $model   = AtteanX::Model::SPARQLCache->new( store => $store, cache => $cache );
my $graph = iri('http://test.invalid/graph');
my $t       = triplepattern(variable('s'), iri('p'), literal('1'));
my $u       = triplepattern(variable('s'), iri('p'), variable('o'));
my $v       = triplepattern(variable('s'), iri('q'), blank('xyz'));
my $w       = triplepattern(variable('a'), iri('b'), iri('c'));
my $x       = triplepattern(variable('s'), iri('q'), iri('a'));

$cache->set('?v001 <p> "1" .', ['<http://example.org/foo>', '<http://example.org/bar>']);
$cache->set('?v002 <p> ?v001 .', {'<http://example.org/foo>' => ['<http://example.org/bar>'],
                                             '<http://example.com/foo>' => ['<http://example.org/baz>', '<http://example.org/foobar>']});

my $bgp     = Attean::Algebra::BGP->new(triples => [$t, $u, $v, $w, $x]);
my @plans   = $p->plans_for_algebra($bgp, $model, [$graph]);

ok(@plans);
kjetilk commented 8 years ago

There's some things that are just weird... Probably me making assumptions again...

I've created a branch try-idp, and inserted some debug warns, dies and DB::single = 1 in there. The latter makes the execution stop when it gets to a join of two tables, which is where the problem occurs. First, the line

my $lcost       = $planner->cost_for_plan($children[0], $self);

jumps into the Model's cost_for_plan, but returns OK. The next line

my $rcost       = $planner->cost_for_plan($children[1], $self);

returns undef. But the really weird thing is that when using the debugger on this, and single stepping it, with 's', it is pretty clear that it just jumps into the Model's cost_for_plan, i.e. $self->cost_for_plan, not $planner->cost_for_plan. I tried to insert some warns at the start of $planner->cost_for_plan too, but it didn't seem to print that...

kjetilk commented 8 years ago

Managed to get a stack dump:

@ = DB::DB called from file '/home/kjetil/dev/p5-atteanx-query-cache/lib/AtteanX/Model/SPARQLCache.pm' line 27
$ = AtteanX::Model::SPARQLCache::cost_for_plan(ref(AtteanX::Model::SPARQLCache), ref(Attean::Plan::HashJoin), ref(AtteanX::Model::SPARQLCache)) called from file '/home/kjetil/dev/p5-atteanx-query-cache/lib/AtteanX/Model/SPARQLCache.pm' line 89
$ = AtteanX::Model::SPARQLCache::cost_for_plan(ref(AtteanX::Model::SPARQLCache), ref(Attean::Plan::HashJoin), ref(AtteanX::Model::SPARQLCache)) called from file '/home/kjetil/dev/attean/lib/Attean/API/QueryPlanner.pm' line 346
@ = Attean::API::IDPJoinPlanner::cost_for_plan(ref(AtteanX::QueryPlanner::Cache), ref(Attean::Plan::HashJoin), ref(AtteanX::Model::SPARQLCache)) called from file '/home/kjetil/dev/attean/lib/Attean/API/QueryPlanner.pm' line 323
@ = Attean::API::IDPJoinPlanner::prune_plans(ref(AtteanX::QueryPlanner::Cache), ref(AtteanX::Model::SPARQLCache), ref(ARRAY), ref(ARRAY)) called from file '/home/kjetil/dev/attean/lib/Attean/API/QueryPlanner.pm' line 257
@ = Attean::API::IDPJoinPlanner::joins_for_plan_alternatives(ref(AtteanX::QueryPlanner::Cache), ref(AtteanX::Model::SPARQLCache), ref(ARRAY), undef, ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY)) called from file '/home/kjetil/dev/attean/lib/Attean/API/QueryPlanner.pm' line 112
@ = Attean::API::SimpleCostPlanner::__ANON__[/home/kjetil/dev/attean/lib/Attean/API/QueryPlanner.pm:114](ref(CODE), ref(AtteanX::QueryPlanner::Cache), ref(AtteanX::Model::SPARQLCache), ref(ARRAY), undef, ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY)) called from file '(eval 764)[/usr/share/perl5/Class/Method/Modifiers.pm:93]' line 1
@ = Attean::IDPQueryPlanner::__ANON__[(eval 764)[/usr/share/perl5/Class/Method/Modifiers.pm:93]:1](ref(AtteanX::QueryPlanner::Cache), ref(AtteanX::Model::SPARQLCache), ref(ARRAY), undef, ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY)) called from file '(eval 766)[/usr/share/perl5/Class/Method/Modifiers.pm:152]' line 2
@ = Attean::IDPQueryPlanner::joins_for_plan_alternatives(ref(AtteanX::QueryPlanner::Cache), ref(AtteanX::Model::SPARQLCache), ref(ARRAY), undef, ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY)) called from file '/home/kjetil/dev/attean/lib/Attean/QueryPlanner.pm' line 752
@ = Attean::QueryPlanner::bgp_join_plans(ref(AtteanX::QueryPlanner::Cache), ref(Attean::Algebra::BGP), ref(AtteanX::Model::SPARQLCache), ref(ARRAY), undef, ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY), ref(ARRAY)) called from file '/home/kjetil/dev/attean/lib/Attean/QueryPlanner.pm' line 149
@ = Attean::QueryPlanner::plans_for_algebra(ref(AtteanX::QueryPlanner::Cache), ref(Attean::Algebra::BGP), ref(AtteanX::Model::SPARQLCache), ref(ARRAY)) called from file '/home/kjetil/dev/p5-atteanx-query-cache/t/undef-test.t' line 38

So, there is a call from the planner, but it seems to confirm that the latest call was from the model.

kjetilk commented 8 years ago

seems like the immediate cause is that when cost_for_plan is called on a planner object, it takes the model as second argument, whereas a model object takes the planner object as argument, but the above code calls both with model as argument. So, the question is how I solve that? And possibly, something should be throwing an error here...

kasei commented 8 years ago

Oops. I forgot to tag the commit (f6a200e), but I just made a change that hopefully fixes this.

kjetilk commented 8 years ago

Yeah, that seems to work, thanks a lot!