microsoft / hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
https://aka.ms/hyperspace
Apache License 2.0
424 stars 115 forks source link

[WIP] Add join v2 rule #501

Open sezruby opened 3 years ago

sezruby commented 3 years ago

What is the context for this pull request?

What changes were proposed in this pull request?

Add a new rule for join query.

Currently, Hyperspace applies indexes only if there are available indexes for both left and right plan. (and their indexed columns should be the same). However, applying one index can also be beneficial when it can remove the shuffle for one of left and right child.

I created a separate rule instead of merging the existing JoinIndexRule, because 1) reduce possible side effect/regression from join rule v2 2) for better extension - conditions / restrictions for v2 rule

The rule is disabled by default and can enable by a spark config:

spark.hyperspace.index.joinv2.enabled = true

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Unit test

TODO: