microsoft / hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
https://aka.ms/hyperspace
Apache License 2.0
423 stars 115 forks source link

Add plan signature validation for Hybrid Scan #158

Open sezruby opened 4 years ago

sezruby commented 4 years ago

Describe the issue

Currently, Hyperspace only supports creating indexes on a logical relation node. In order to support arbitrary logical plans, plan signature comparison is required for getting candidate indexes. This work is done by #76. In case of hybrid scan, we cannot utilize the "index" signature value in IndexLogEntry which is a composite of FileBasedSignature + PlanSignature, because

153 handles this issue by checking the metadata of source files directly without using the index signature.

As a follow-up, we need to add plan signature validation for hybrid scan accordingly. This could be done by following: 1) Store the plan signature of the source plan separately in IndexLogEntry 2) Compare plan signatures in getCandidateIndex

To Reproduce

Expected behavior

Environment

imback82 commented 3 years ago

@sezruby Can you check if this issue can be closed?

sezruby commented 3 years ago

Since we only allow 1 relation and the current plan signature of non-hybrid scan is just using "Relation" node, I think we don't need to consider the query plan(Relation) for Hybrid Scan - until we support #95.