This PR introduces new abstractions for the covering index query rewriter, facilitating support for different source table relation matching and rewriting. This enhancement paves the way for future support of Iceberg table relations.
[x] Add logging or whyNot API to explain why/whyNot index applied [Logging added in this PR]
[ ] Support SQL hint for Spark conf, ex. CV rewrite, hybrid scan [TBD]
[ ] Support partial covering index rewrite [TBD]
Changes
Added new FlintSparkSourceRelationProvider and FlintSparkSourceRelation abstraction. Please see Scala doc for its responsibility in details. Basically,
FlintSparkSourceRelationProvider: determine if a given logical relation can be supported by Flint optimizer.
FlintSparkSourceRelation: provide all information required by query rewriting for a specific source relation.
Will refactor ApplyFlintSparkSkippingIndex and FlintSparkValidationHelper.isTableProviderSupported based on these in future.
Testing
spark-sql> CREATE INDEX all ON myglue.ds_tables.http_logs
> (
> `@timestamp`,
> clientip,
> request,
> status,
> size
> );
scala> sc.setLogLevel("INFO")
scala> sql("EXPLAIN SELECT clientip FROM myglue.ds_tables.http_logs WHERE status != 200").show
# Logging explains whether and why the index is applied
24/05/03 17:51:17 INFO FlintSparkSourceRelationProvider: Loaded source relation providers [file]
24/05/03 17:51:17 INFO ApplyFlintSparkCoveringIndex: Provider [file] can match sub plan LogicalRelation
24/05/03 17:51:18 INFO ApplyFlintSparkCoveringIndex: Found covering index
[flint_myglue_ds_tables_http_logs_all_index] on table myglue.ds_tables.http_logs
24/05/03 17:51:18 INFO ApplyFlintSparkCoveringIndex:
Is covering index flint_myglue_ds_tables_http_logs_all_index applicable: true
Index state: Some(active)
Index filter condition: None
Columns required: Set(clientip, status)
Columns indexed: Set(@timestamp, request, size, clientip, status)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Description
This PR introduces new abstractions for the covering index query rewriter, facilitating support for different source table relation matching and rewriting. This enhancement paves the way for future support of Iceberg table relations.
PR Planned
Changes
Added new
FlintSparkSourceRelationProvider
andFlintSparkSourceRelation
abstraction. Please see Scala doc for its responsibility in details. Basically,FlintSparkSourceRelationProvider
: determine if a given logical relation can be supported by Flint optimizer.FlintSparkSourceRelation
: provide all information required by query rewriting for a specific source relation.Will refactor
ApplyFlintSparkSkippingIndex
andFlintSparkValidationHelper.isTableProviderSupported
based on these in future.Testing
Issues Resolved
https://github.com/opensearch-project/opensearch-spark/issues/298
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.