Abstracting source relations for enhanced covering index rewriting

Description

This PR introduces new abstractions for the covering index query rewriter, facilitating support for different source table relation matching and rewriting. This enhancement paves the way for future support of Iceberg table relations.

PR Planned

[x] https://github.com/opensearch-project/opensearch-spark/pull/318
[x] https://github.com/opensearch-project/opensearch-spark/pull/325 [Track in Iceberg project separately]
[ ] https://github.com/opensearch-project/opensearch-spark/pull/391 [Current PR]
[x] Add logging or whyNot API to explain why/whyNot index applied [Logging added in this PR]
[ ] Support SQL hint for Spark conf, ex. CV rewrite, hybrid scan [TBD]
[ ] Support partial covering index rewrite [TBD]

Changes

Added new FlintSparkSourceRelationProvider and FlintSparkSourceRelation abstraction. Please see Scala doc for its responsibility in details. Basically,

FlintSparkSourceRelationProvider: determine if a given logical relation can be supported by Flint optimizer.
FlintSparkSourceRelation: provide all information required by query rewriting for a specific source relation.

Will refactor ApplyFlintSparkSkippingIndex and FlintSparkValidationHelper.isTableProviderSupported based on these in future.

Testing

spark-sql> CREATE INDEX all ON myglue.ds_tables.http_logs
         > (
         >   `@timestamp`,
         >   clientip,
         >   request,
         >   status,
         >   size
         > );

scala> sc.setLogLevel("INFO")
scala> sql("EXPLAIN SELECT clientip FROM myglue.ds_tables.http_logs WHERE status != 200").show

# Logging explains whether and why the index is applied
24/05/03 17:51:17 INFO FlintSparkSourceRelationProvider: Loaded source relation providers [file]
24/05/03 17:51:17 INFO ApplyFlintSparkCoveringIndex: Provider [file] can match sub plan LogicalRelation
24/05/03 17:51:18 INFO ApplyFlintSparkCoveringIndex: Found covering index 
[flint_myglue_ds_tables_http_logs_all_index] on table myglue.ds_tables.http_logs
24/05/03 17:51:18 INFO ApplyFlintSparkCoveringIndex:
 Is covering index flint_myglue_ds_tables_http_logs_all_index applicable: true
   Index state: Some(active)
   Index filter condition: None
   Columns required: Set(clientip, status)
   Columns indexed: Set(@timestamp, request, size, clientip, status)

Issues Resolved

https://github.com/opensearch-project/opensearch-spark/issues/298

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-project / opensearch-spark