risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.08k stars 585 forks source link

Make state table visible for creating MV #19031

Open xxchan opened 1 month ago

xxchan commented 1 month ago

Currently we hide both MV and their state tables during backfilling. However, I think there's no need to hide the state tables:

  1. Hiding state tables don't have large benefits, compared with hiding the MV (e.g., for consistency), since state table is an internal thing.
  2. For backfill executors' state tables, they are only meaningful during the backfill stage. Their content can also be viewed as an observability tool for backfilling. So it's beneficial to expose them.

related:

BugenZhao commented 1 month ago

Since https://github.com/risingwavelabs/risingwave/pull/17503, we've already broadcast all table catalogs (including both internal tables and the MV) to the frontend immediately when meta starts the creating procedure. You can verify this by SHOW INTERNAL TABLES.

Selecting from internal tables is intentionally disabled, by only resolving the "created" table during binding.

https://github.com/risingwavelabs/risingwave/blob/33bca3a61e94975a44d73d7a9743bf8e27ace4c7/src/frontend/src/binder/relation/table_or_source.rs#L177-L178

BugenZhao commented 1 month ago

However, as described in #18944, the catalogs received by the frontend during creation are incomplete, with fields like fragment_id or vnode_count not correctly filled. Performing batch scan during this period may lead to problem.

Perhaps this is the real motivation for the refactor of only notifying the complete catalogs once to the frontends, which is the original idea of #18944:

image
kwannoel commented 1 month ago

However, as described in #18944, the catalogs received by the frontend during creation are incomplete, with fields like fragment_id or vnode_count not correctly filled. Performing batch scan during this period may lead to problem.

Perhaps this is the real motivation for the refactor of only notifying the complete catalogs once to the frontends, which is the original idea of #18944:

image

I suppose your proposal is to notify the catalogs to frontend, once the TableFragments are built, since only at that time the fragment_id and vnode_count will be correct. Sounds reasonable to me. Wdyt @yezizp2012.

This can allow us to expose the internal state table of backfill for querying.

kwannoel commented 1 month ago

Tracked this issue as part of: https://github.com/risingwavelabs/risingwave/issues/19084. To allow better management of stream job creation.