opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[FEATURE] Incremental refresh index on Hive source table #91

Closed dai-chen closed 8 months ago

dai-chen commented 1 year ago

Is your feature request related to a problem?

Currently Spark structured streaming only supports Spark Data Source table. For Hive table, it throws exception when starting streaming job.

What solution would you like?

Opt-1: Support streaming on Hive table either enhance stream source operator or convert Hive table internally (not sure if possible) Opt-2: Give user clear message simply or how to create Spark DS table guide upon error. Related to https://github.com/opensearch-project/opensearch-spark/issues/65

dai-chen commented 1 year ago

A new option is this may be supported by Lazy Build idea in https://github.com/opensearch-project/opensearch-spark/issues/118 if the proposal feasible.

dai-chen commented 11 months ago

Need to double check if all or only certain Hive table cannot be supported by Spark structured streaming.

penghuo commented 9 months ago

no plan to support hive table as streaming source in short term. more context of hive table issue with spark structured streaming query in https://github.com/opensearch-project/opensearch-spark/issues/65#issuecomment-1908974123.