Closed clee704 closed 3 years ago
Implement the data skipping index application rule.
Yes, users can create data skipping indexes that can be applied to filter queries.
import com.microsoft.hyperspace.Hyperspace import com.microsoft.hyperspace.index.dataskipping.DataSkippingIndexConfig import com.microsoft.hyperspace.index.dataskipping.sketches.MinMaxSketch spark.range(100).toDF("A").write.parquet("X") val df = spark.read.parquet("X") val hs = Hyperspace() hs.createIndex(df, DataSkippingIndexConfig("myind", MinMaxSketch("A"))) hs.explain(df.filter("A = 1"))
============================================================= Plan with indexes: ============================================================= Filter (isnotnull(A#271L) AND (A#271L = 1)) +- ColumnarToRow +- FileScan Hyperspace(Type: DS, Name: myind, LogVersion: 1) [A#271L] Batched: true, DataFilters: [isnotnull(A#271L), (A#271L = 1)], Format: Parquet, Location: DataSkippingFileIndex[file:/home/chungmin/Repos/spark3.1/X], PartitionFilters: [], PushedFilters: [IsNotNull(A), EqualTo(A,1)], ReadSchema: struct<A:bigint> ============================================================= Plan without indexes: ============================================================= Filter (isnotnull(A#271L) AND (A#271L = 1)) +- ColumnarToRow +- FileScan parquet [A#271L] Batched: true, DataFilters: [isnotnull(A#271L), (A#271L = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/home/chungmin/Repos/spark3.1/X], PartitionFilters: [], PushedFilters: [IsNotNull(A), EqualTo(A,1)], ReadSchema: struct<A:bigint> ============================================================= Indexes used: ============================================================= myind:file:/home/chungmin/Repos/spark3.1/spark-warehouse/indexes/myind/v__=0
Unit test
Could you split the PR? e.g. part 3-1: utils, part 3-2: apply?
Thanks for the detailed review!
What is the context for this pull request?
What changes were proposed in this pull request?
Implement the data skipping index application rule.
Does this PR introduce any user-facing change?
Yes, users can create data skipping indexes that can be applied to filter queries.
How was this patch tested?
Unit test