kasei / perlrdf

Deprecated in favor of the Attean package
26 stars 25 forks source link

HSP-based heuristics are producing bad BGP orderings #123

Open kasei opened 9 years ago

kasei commented 9 years ago

The HSP BGP ordering introduced in pull request #114 seem to be producing bad triple orderings. In this query:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?server (count(?server) AS ?count)
WHERE {
    GRAPH ?g {
        ?site <urn:app:freshtime:hard> ?hard ;
            <urn:app:hasrequest> ?request .
        ?request <http://www.w3.org/2007/ont/http#hasResponse> ?response .
        ?response <http://www.w3.org/2007/ont/httph#server> ?server .
        FILTER (xsd:integer(?hard) > 0)
    }
}
GROUP BY ?server
ORDER BY ?count

the triple ordering produces a cartesian product between (1⋈2) and 3:

(1) ?site <urn:app:freshtime:hard> ?hard
(2) ?site <urn:app:hasrequest> ?request
(3) ?response <http://www.w3.org/2007/ont/httph#server> ?server
(4) ?request <http://www.w3.org/2007/ont/http#hasResponse> ?response
kasei commented 9 years ago

I think it might be good for the built-in BGP planner to do simple analysis and to either error or emit loud warnings when a cartesian product occurs when it's possible to avoid it.

kjetilk commented 9 years ago

Ouch, my bad!

Could you please write the warnings, and then revert the HSP code?

I'll see if I can find some time to make a real fix. I have an idea on how to do it, but don't know when...

kjetilk commented 9 years ago

I tried as a quick-fix to reintroduce the code that was in sort_for_join_variables into merge_patterns. It seems harder to test, because it produces different plans for every invocation, it seems. But also, it seems to remove some of the benefit of the heuristics, as some of the plans are rather bushy, and it moves some of the single-variable patterns around.

It does produce the right plan for my case though. I don't know if you want to merge this, or just revert for now? I fear it may take some before I can find time to implement the last heuristic.

kjetilk commented 9 years ago

I'm thinking about simplifying the problem further. First, it should be the case that the first block of triple patterns the merge_pattern method gets all have a common variable. There may be more than one such block. The problem may be that between these blocks, the common variables aren't adjacent, in the sense that the triple pattern that has a common variable between the blocks may not be the last in the first block and first in the second block.

So, the question is how this fact can be used in a quick-fix...

At the very least, it should be possible to just take the triple patterns with less than 2 variables, and not touch them in the first pattern, I suppose. But is it possible to do something more clever than that for a quick-fix?