opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
12 stars 18 forks source link

[BUG] Create Skipping Index does not support multiple part identifier #353

Closed penghuo closed 1 week ago

penghuo commented 1 month ago

What is the bug? Create Skipping Index statement does not support multiple part identifier

How can one reproduce the bug?

24/05/21 18:31:54 INFO FlintREPL: command complete: FlintCommand(state=failed, query=CREATE SKIPPING INDEX ON glue.default.amazon_vpc_flow ( accountid BLOOM_FILTER, region VALUE_SET, severity_id VALUE_SET, src_endpoint.ip BLOOM_FILTER, dst_endpoint.ip BLOOM_FILTER, src_endpoint.svc_name VALUE_SET, dst_endpoint.svc_name VALUE_SET, request_processing_time MIN_MAX, traffic.bytes MIN_MAX ) WITH ( auto_refresh = true, refresh_interval = '15 Minutes', checkpoint_location = 's3://bucket/', watermark_delay = '1 Minute' ) , statementId=N01wQlZUbkQ1U2ZsaW50X2FsbF9wZXJtaXNzaW9ucw==, queryId=N01wQlZUbkQ1U2ZsaW50X2FsbF9wZXJtaXNzaW9ucw==, submitTime=1716316289390, error=Some({"Message":"Syntax error: \nSyntax error at or near 'SKIPPING'(line 1, pos 7)\n\n== SQL ==\nCREATE SKIPPING INDEX ON glue.default.amazon_vpc_flow ( accountid BLOOM_FILTER, region VALUE_SET, severity_id VALUE_SET, src_endpoint.ip BLOOM_FILTER, dst_endpoint.ip BLOOM_FILTER, src_endpoint.svc_name VALUE_SET, dst_endpoint.svc_name VALUE_SET, request_processing_time MIN_MAX, traffic.bytes MIN_MAX ) WITH ( auto_refresh = true, refresh_interval = '15 Minutes', checkpoint_location = 's3://glue, watermark_delay = '1 Minute' ) \n-------^^^\n"}))

What is the expected behavior? create index sucessfully

What is your host/environment?

dai-chen commented 1 month ago

As a workaround, user can backtick the entire column name with dots. I quick verified that this works for both skipping index building and query write. Here is the sample query:

CREATE SKIPPING INDEX ON glue.default.amazon_vpc_flow (
  accountid BLOOM_FILTER,
  region VALUE_SET,
  severity_id VALUE_SET,
  `src_endpoint.ip` BLOOM_FILTER,
  `dst_endpoint.ip` BLOOM_FILTER,
  ...