risinglightdb / risinglight

An educational OLAP database system.
Apache License 2.0
1.59k stars 211 forks source link

feat: support view in memory #837

Closed wangrunji0408 closed 5 months ago

wangrunji0408 commented 5 months ago

This PR is a part of #796, adds support for creating, querying and dropping views in memory.

The key implementations are:

  1. When creating a view, bind the query and store the logical plan with the view in catalog.
  2. When querying from a view, build executors for all views and then build other plan nodes on top of them. Given that a view can be consumed by multiple downstream nodes, we introduce StreamSubscriber to allow multiple consumers of a stream.

Limitations:

  1. We don't persist views in disk storage.
  2. We don't support inferring schema from the query. Columns must be defined explicitly when creating a view.
  3. We don't maintain dependency relationship between tables and views.
skyzh commented 5 months ago

When querying from a view, build executors for all views and then build other plan nodes on top of them. Given that a view can be consumed by multiple downstream nodes, we introduce StreamSubscriber to allow multiple consumers of a stream.

Well, then we have DAG in the system 🤪 I thought an easier way is just to create multiple copies of the plan and then execute it multiple times.

wangrunji0408 commented 5 months ago

When querying from a view, build executors for all views and then build other plan nodes on top of them. Given that a view can be consumed by multiple downstream nodes, we introduce StreamSubscriber to allow multiple consumers of a stream.

Well, then we have DAG in the system 🤪 I thought an easier way is just to create multiple copies of the plan and then execute it multiple times.

I think it is natural to have DAG in data processing pipelines. It'd be better to reuse results from common upstreams. Another interesting fact is that a query plan in egg's e-graph is also a DAG, even if you don't construct DAG intentionally. Because egg can automatically identify and merge equal nodes. This will make it easier to eliminate common subexpressions and even reuse CTEs in the future.