substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.14k stars 148 forks source link

feat: add PhysicalJoinType #572

Closed danepitkin closed 2 weeks ago

danepitkin commented 9 months ago

BREAKING CHANGE: the JoinType for physical operators HashJoin, MergeJoin, and NestedLoopJoin has been refactored into a shared enum called PhysicalJoinType. It is wire compatible and a new is_null_aware field is added to the physical join rels.

Closes https://github.com/substrait-io/substrait/issues/563

danepitkin commented 9 months ago

Question: Is it better for the original ANTI_JOIN enums to default to NO_NULLS or WITH_NULLs?

danepitkin commented 9 months ago

Is null handling part of the type or is it a separate setting/option?

This Velox documentation makes me think that these are considered different types of anti joins (rather than a separate config): https://facebookincubator.github.io/velox/develop/anti-join.html

danepitkin commented 9 months ago

Hmm, although they did implement it slightly differently, using JoinType::kAnti and JoinType::kNullAwareAnti, presumably with the NULL config separate.

Edit: actually, this is similar to this PR, but the left/right option is configured elsewhere it seems.

github-actions[bot] commented 9 months ago

ACTION NEEDED

Substrait follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

danepitkin commented 9 months ago

I see @westonpace 's old PR here does use a null_is_match bool instead of expanding the jointype enum https://github.com/substrait-io/substrait/pull/467

danepitkin commented 9 months ago

Hmm, maybe it would be better to have bool null_aware in each physical join so that other join types can use it.

jacques-n commented 9 months ago

This shouldn't be a breaking binary change, right?

danepitkin commented 9 months ago

This shouldn't be a breaking binary change, right?

Yes, good call!

Edit: The "Breaking Changes Check" does flag this, though.

westonpace commented 7 months ago

I've created a new PR specifically for dealing with the null_is_match / null_aware stuff at https://github.com/substrait-io/substrait/pull/585

Do you want to rescope this PR to just dealing with the join type?

jacques-n commented 2 weeks ago

No progress has been made in more than six months. Closing without prejudice.