Open dai-chen opened 2 years ago
Hi @dai-chen the design is excellent, always appreciate it that people like me in the community has the opportunity to read such great design like this! But I have a couple of questions after reading it.
thanks @chloe-zh for the comment. there is a long plan in the works, we will share the RFCs here shortly.. in short for now/ 1.) the SQL core will be a more formal federated query acceleration engine with support for autonomous materialized views. A lot to unpack in the prior statement, but thats where we are headed. The SQL core will run as an independent distributed engine and also part of opensearch; as the use-cases need it. We may probably need to rename the SQL core to a better name :)
2) we are researching and thinking about spark / hyperspace or we build our own tier. The priority will be open source solutions first.
-> OpenSearch will be used for join and intermediate materialized views.. -> no in memory stuff. OpenSearch data structures are awesome.
hope this answers some questions , we will post more details shortly here with some demo videos :)
Is your feature request related to a problem?
In a database engine, there are different ways to optimize query performance. For instance, rule-based/cost-based optimizer and distributed execution layer tries to find best execution plan by cost estimate and equivalent transformation of query plan. Here we're proposing an alternative approach which is to accelerate query execution by materialized view for time-space tradeoff.
What solution would you like?
Architecture
Here is a reference architecture that illustrates components and the entire workflow which essentially is a workload-driven feedback loop:
Basically, feedback is referring to various materialized view prebuilt (either online or offline) which hints acceleration opportunity to query optimizer.
There are 2 areas and paths moving forward for both of which lack open source solutions:
General Acceleration Workflow
1.Workload Telemetry Collecting
Collect query plan telemetry generated in query execution and emit it as feedback generation input.
2.Workload Telemetry Preprocessing
Preprocess query plan telemetry into uniform workload representation.
3.View Selection
Analyze workload data and select sub-query as materialization candidate according to view selection algorithm.
4.View Materialization and Maintenance
Materialize selected view and maintain the consistency between source data and materialized view data, by incrementally refreshing for example.
5.Query Plan Optimization
At last, query optimizer checks the existing materialized view and replace original operator with scan on materialized view.