opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
20 stars 32 forks source link

[FEATURE]Support existing Index usage in Flint #72

Open YANG-DB opened 1 year ago

YANG-DB commented 1 year ago

Is your feature request related to a problem? As a flint user, I'd like to use existing indices / index-templates for using as the index targets of the flint accelerated tables

What solution would you like? Use existing index name to create the acceleration process - this will actually not create an index but use the given name as the target of the acceleration ETL store.

The next SQL syntax suggested:

CREATE (SKIPPING/COVERING/MV) INDEX
ON alb_logs USING ss4o_logs-elb-prod
WITH (
  auto_refresh = true,
  refresh_interval = '1 minute',
  checkpoint_location = 's3://test/'
)

Would initiate the acceleration ETL sync process without actually creating a new index in OpenSearch, it will use the ss4o_logs-elb-prod index (index template) as the data store for the acceleration content.

It may validate the following:

Do you have any additional context? Using existing SS4O schema definition

Swiddis commented 1 year ago

Should it also support something like USING IF NOT EXISTS to separate situations where the OpenSearch index already exists from ones where we simply want it to create an index as usual with a new name? I think just renaming the new destination index with USING IF NOT EXISTS will be a lot less error-prone than trying to create both the flint index and OS index separately and unify them at the end, while covering the majority of use cases.