opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
12 stars 18 forks source link

[FEATURE]Create MV index target using existing opensearch index #358

Open YANG-DB opened 1 month ago

YANG-DB commented 1 month ago

Is your feature request related to a problem? There are some cases which existing indices or index templates are needed to be used as the ingestion target sync of the MV query.

For example:

What solution would you like? The MV creation metadata specification would have the next fields: using_existing_index = opensearch_index_name

WITH (
  auto_refresh = true,
  refresh_interval = '15 Minute',
  checkpoint_location = '{s3_checkpoint_location}',
  watermark_delay = '1 Minute',
  using_existing_index = 'opensearch_index_name',
  extra_options = '{ "{table_name}": { "maxFilesPerTrigger": "10" }}'
)

For the index name to be compliant with Flint's naming convention - the MV name would be added to the index as an index alias

Do you have any additional context?

noCharger commented 1 month ago

Although not explicitly stated in https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#flint-index-specification, the CREATE MATERIALIZED VIEW command does not indicate that any existing materialized view can be utilized. Alter the MV definition in with clause seems to be acheivable.

penghuo commented 1 month ago

Try to understand the use case of create MV with indexing index. and what is expected workflow?

existing index template contains specific data-types supported only in opensearch

We can extend Spark data type to support OpenSearch data type.

existing index contains data that needs to be appended with the MV query results

UNION can solve this problem.