Closed westonpace closed 7 months ago
@westonpace this would be a breaking change for existing APIs?
@westonpace this would be a breaking change for existing APIs?
It would not be. If a plan contains the old left_keys / right_keys field then the "equal" equality method should be used. I believe this is documented.
I'm happy with this so +1 for me. @EpsilonPrime items feel like minor changes so I'm fine with or without them before merge.
For the questions I had about the meaning of consistency, it could be worth adding the PR as well as it wasn't immediately clear what consistent meant.
I'll open a follow-up PR.
This PR (hopefully) concludes various discussions around flags such as
null_equals_null
(Datafusion) andnull_aware
(Velox). The goal of these flags is to slightly tweak the definition of "equality" in an equijoin relation.This PR introduces a new EquiJoinKey message that can be used by physical join relations to define how keys should be compared.
These custom equality functions are needed in a variety of scenarios:
Optimizing set operations
Set operations (e.g. set difference) can sometimes be satisfied by an equi-join. When this happens the user typically wants the equality comparison to be "is not distinct from"
Flattening correlated subqueries
Some kinds of correlated subqueries can be removed during optimization and replaced with an anti-join. Depending on the original query ("not in" vs "where not exists") there may be slightly different behaviors with respect to null an we may want to use "might equals" as the comparison.
String collations
Collations define the ordering and equality of a column. Different columns can have different collations. The equi-join must use the comparison function defined by the collation.