Is your feature request related to a problem?
One of the key technical challenge in https://github.com/opensearch-project/sql/issues/719 is how to maintain the consistency between base table (S3 data) and derived table (OpenSearch index/materialized view).
What solution would you like?
One solution for the problem is to refresh new data from S3 to OpenSearch incrementally. We are proposing to enhance our query engine by unifying the batch processing and stream processing capability in single architecture as existing solution in Apache Flink and Spark. In particular, the enhancement includes changes in query planning, query execution engine and query plan itself.
What alternatives have you considered?
The alternative solution is rebuild the derived table (full refresh) on user demand or regular basis. This can be done by current batch processing architecture, however, introduce significant overhead for large S3 dataset it will.
Is your feature request related to a problem? One of the key technical challenge in https://github.com/opensearch-project/sql/issues/719 is how to maintain the consistency between base table (S3 data) and derived table (OpenSearch index/materialized view).
What solution would you like? One solution for the problem is to refresh new data from S3 to OpenSearch incrementally. We are proposing to enhance our query engine by unifying the batch processing and stream processing capability in single architecture as existing solution in Apache Flink and Spark. In particular, the enhancement includes changes in query planning, query execution engine and query plan itself.
PoC branch: https://github.com/opensearch-project/sql/tree/poc/maximus-m1. User manual and design doc in details will be published later as planned below.
What alternatives have you considered? The alternative solution is rebuild the derived table (full refresh) on user demand or regular basis. This can be done by current batch processing architecture, however, introduce significant overhead for large S3 dataset it will.
Do you have any additional context?
Phase 1
Goal:
Tasks
Phase 2
Goal:
Tasks
Phase 3
Goal:
Tasks