opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
12 stars 18 forks source link

[FEATURE] Enhance Flint covering index with additional OpenSearch field type support #384

Open dai-chen opened 2 weeks ago

dai-chen commented 2 weeks ago

Is your feature request related to a problem?

Currently, Flint covering index maps column types from the source table to very basic types in OpenSearch. There is no way to create covering index with other type such as text, IP, and vector etc. This limitation restricts the flexibility and full potential of using Flint with OpenSearch.

What solution would you like?

Flint covering index should automatically create an underlying OpenSearch index with the most suitable types or provide a way for users to configure these types.

Possible solutions include:

  1. Flint registers its own type (UDT in Spark), allowing users to create source tables using it. However, this solution does not help tables created outside Flint extension.
  2. Users can provide type information in the CREATE INDEX statement, e.g., CREATE INDEX all ON test (message TEXT ...), though this is not standard in the SQL world.
  3. Users can achieve this through a materialized view, e.g., CREATE MATERIALIZED VIEW test AS SELECT CAST(message AS TEXT).

What alternatives have you considered?

Users can manually create an OpenSearch index with the desired field types, but this approach is cumbersome and prone to errors if there is a mismatch between the OpenSearch index and the covering index definition.

Do you have any additional context?

Here is the relevant Flint source code that handles data type mappings: FlintDataType.scala#L125.