Open dallmair opened 5 years ago
@dallmair
Thanks! The outer join definitely shouldn't be replaced by an inner join but there is a larger issue here: the code shouldn't crash with a key not found exception.
I think the problem is the join orderer which seems to mess up cases where outer references are included.
Outer references are only allowed to appear on the right hand side of a join (i.e. right can reference left, but not the other way around). However, the orderer swaps left and right in an illegal fashion, causing the outer reference to become unresolvable later.
I'm not entirely sure what the fix for this should be yet. I could either disallow join ordering entirely for joins with outer references or try to handle them when reassembling the join graph. Skipping joins with outer references might be the easiest fix but my concern is that it might stop optimizing joins in certain queries entirely. Alternatively I might ignore the only-left-can-reference-right constraint during optimization entirely and introduce a late phase that enforces it by swapping left and right when necessary.
Here's a query that exhibits different behavior:
Previously, this failed with a KeyNotFoundException when evaluated, because the OuterJoinRemover erroneously changed the left semi join to an inner join.
Reasoning: The OuterJoinRemover checked for any null-rejecting column anywhere in the query, but it needs to check for only those columns that are null-rejecting in the respective subtree, i.e. children of the join node.