risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.06k stars 581 forks source link

Decouple `show jobs` from `CreateMviewProgressTracker` #19189

Open kwannoel opened 3 weeks ago

kwannoel commented 3 weeks ago

We can decouple rw_ddl_progress from meta’s materialized view progress tracker, and maintain it adhoc:

  1. Make internal backfill state tables visible. They contain the backfilled row_count, and whether the backfill is finished or not. a. Query all the internal backfill state tables for an MV to fetch row_count and finished status. b. Query internal tables of the MV. c. Regex the name for backfill tables.
  2. Query the hummock version stats, so we can get the upstream row count.
  3. Calculate the estimated progress.

This will: Simplify logic of materialized view progress tracker. We no longer need to maintain state for counts. Allow us to track the backfill progress of created sink jobs. In general, I think the same pattern can be applied to other forms of backfilling, snapshot backfill and shared source backfill.

kwannoel commented 3 weeks ago

Implementation-wise, we need to consider querying permissions for the internal table.