kasei / attean

A Perl Semantic Web Framework
19 stars 10 forks source link

Algebra equivalence or superset #79

Open kjetilk opened 8 years ago

kjetilk commented 8 years ago

@RubenVerborgh and I had a discussion today about different abstraction levels, and their consequences further down the stack, and I came to the conclusion that it would be interesting to support something motivated from the following.

Say that you had some kind of a service that could efficiently answer things like

?s dct:identifier "965363" ;
    foaf:name ?name .

or

?s dct:title ?title .
FILTER regex(str(?title), "^Dahuts")

The main point being that there is a clear pattern that can be detected and that a plan can be generated for just this pattern. Of course, we can do that now, but I was thinking of ways to do it in less code and more concisely.

My idea was then that even though query containment is complex, it is likely to be used in very constrained ways, so a simple method to check if two patterns are the same, so that I could do stuff in plans_for_algebra like

my $bgp = Attean::Algebra::BGP->new(triples => [triplepattern(variable('s') ... , ...), ...]);
if ($algebra->isa('Attean::Algebra::BGP') && $algebra->equiv($bgp)) {
  push(@plans, AtteanX::Plan::FooBar->new(...));
}

Does this sound like a good idea?

kasei commented 8 years ago

Yeah, that sounds good. The one challenge to making this really useful in planning is that you won't always get the algebra you want, even if it's equivalent to what you want. Expanding your second example a bit:

?s a ?type ; dct:title ?title .
FILTER regex(str(?title), "^Dahuts")

Let's say your system can handle just the dct:title triple and the FILTER, but not the rdf:type triple. Your planner is going to get a 2-triple BGP wrapped in a filter instead of the equivalent plan that you might prefer, a 1-triple BGP wrapped in a filter joined with another 1-triple BGP.

This is a general problem and one that's hard to solve. There are just so many equivalent rewritings of a query. I'm not sure if we should be looking towards letting extension code try to express a preference for what patterns they want and try to rewrite the query to support that, or if the extension might pick apart the algebra, handle what it can, and hand back the remaining partial algebra for re-planning.

Doing the simple stuff should be doable, though. I'll think about it and try to hack something together.

kjetilk commented 8 years ago

Right!

So, wrapping it in curly brackets, like:

{ 
  ?s a ?type . 
  { 
    ?s dct:title ?title .
    FILTER regex(str(?title), "^Dahuts")
  }
}

might that help? It would require the query author to understand what backends would be available, though.

kasei commented 8 years ago

Yeah, that would probably work, but not in all cases, and is exactly the sort of thing I want to avoid (requiring users to understand the implementation choices).

kasei commented 8 years ago

How "equal" do you want this equality test to be? Identical structures? Canonical term values are equal? Equal after canonical variable naming and triple pattern ordering?

kjetilk commented 8 years ago

It sounds to me that canonical variable naming and triple pattern ordering is the easiest to implement, so we can go for that, I think…

kasei commented 8 years ago

Well... the easiest would be identical structures (triples in the same order, variables named the same). That probably isn't super useful, though...

kjetilk commented 8 years ago

Ah, right! No, using the same variable names wouldn't be terribly useful ;-)

kjetilk commented 6 years ago

Just a little loudthinking around this subject:

Since the main problem with the curly brackets solution is that the query author has to understand what the query engine can do, I was thinking about heuristics that might often help on the engine. Then, I thought about some heuristics, say that we had a dictionary that recorded parts of the query that could be answered efficiently, like for example, for

    ?s dct:title ?title .
    FILTER regex(str(?title), "^Dahuts")

we would record

    ?s dct:title ?title .

and

    FILTER regex(str(?title), "^Dahuts")

separately in the dictionary. Then, as early as possible, for example at parse time, the parser would look up in the dictionary and see if any other part of the query matches the rest of the pattern, and then construct the algebra appropriately, to make it easier for the planner to construct the wanted query plan.

Might that work?

RubenVerborgh commented 6 years ago

Makes me think of the Bucket Algorithm by Levy et al., where you also consider other constraints instead of only relations.

kjetilk commented 5 years ago

We've started to talk about shapes now, and I'm wondering if using shapes to group patterns in such a way that this becomes feasible is an option... Like, the shapes could be used to choose certain algebras over others?

kasei commented 5 years ago

@kjetilk I think shapes probably have a use here. Also possibly relevant is the framing work that's been going on with JSON-LD. I'm not very swapped in on that, but I wonder if there might be applications of framing on query patterns. That could allow a general solution for the problem of planners expressing how they expect/prefer the query patterns.