terrajobst / nquery-vnext

A Roslyn inspired rewrite of NQuery
MIT License
72 stars 16 forks source link

Fix OuterJoinRemover so it does not optimize too aggressively #27

Open dallmair opened 5 years ago

dallmair commented 5 years ago

Here's a query that exhibits different behavior:

SELECT  o.OrderID
FROM    Orders o
WHERE   (
            SELECT  TOP 1 od2.ShipVia
            FROM    Orders od2
            WHERE   od2.OrderID > o.OrderID
        ) IS NOT NULL

Previously, this failed with a KeyNotFoundException when evaluated, because the OuterJoinRemover erroneously changed the left semi join to an inner join.

Reasoning: The OuterJoinRemover checked for any null-rejecting column anywhere in the query, but it needs to check for only those columns that are null-rejecting in the respective subtree, i.e. children of the join node.

terrajobst commented 5 years ago

@dallmair

Thanks! The outer join definitely shouldn't be replaced by an inner join but there is a larger issue here: the code shouldn't crash with a key not found exception.

I think the problem is the join orderer which seems to mess up cases where outer references are included.

Outer references are only allowed to appear on the right hand side of a join (i.e. right can reference left, but not the other way around). However, the orderer swaps left and right in an illegal fashion, causing the outer reference to become unresolvable later.

I'm not entirely sure what the fix for this should be yet. I could either disallow join ordering entirely for joins with outer references or try to handle them when reassembling the join graph. Skipping joins with outer references might be the easiest fix but my concern is that it might stop optimizing joins in certain queries entirely. Alternatively I might ignore the only-left-can-reference-right constraint during optimization entirely and introduce a late phase that enforces it by swapping left and right when necessary.