opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[FEATURE] Enable covering index acceleration for Iceberg tables #719

Open dai-chen opened 1 month ago

dai-chen commented 1 month ago

Is your feature request related to a problem?

Currently, the query rewrite optimization introduced in issue #298 is only available for Spark File data source, but it doesn't support Iceberg tables, leading to missed opportunities for optimization.

What solution would you like?

I would like to enable covering index acceleration for Iceberg tables, similar to the optimization introduced in issue #298. This would involve rewriting queries to leverage the covering index where applicable, allowing more efficient access to the indexed columns without scanning the full table data.

What alternatives have you considered?

N/A

Do you have any additional context?

To enable covering index acceleration for Iceberg tables, we specifically need to implement support for the FlintSparkSourceRelationProvider which is abstraction introduced in PR https://github.com/opensearch-project/opensearch-spark/pull/325.

Additionally, it is important to verify that the partial index optimization and the spark.flint.optimizer.covering.enabled configuration are functioning as expected for Iceberg tables.

dblock commented 1 month ago

[Catch All Triage - 1, 2]