risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.98k stars 575 forks source link

support alter add/drop column for target table of sink #14417

Closed shanicky closed 2 months ago

shanicky commented 9 months ago

Edited by @neverchanje

Here is the use case:

Suppose there are multiple tables from a upstream database joined together via CREATE SINK INTO into a target table. When the upstream table adds a column, the target table needs to be altered accordingly.

This is is a crucial ability in data enrichment scenarios. We used to consider ALTER MATERIALIZED VIEW or something similar to tackle this issue, but it'd likely introduce substantial complexity. The semantic of MATERIALIZED VIEW inherently couples both the streaming logic and result table, and therefore being more difficult to alter.

"CREATE SINK INTO + adding columns in both target and source tables" decouples the result table and streaming logic, allowing the logic changes without affecting result serving. "CREATE SINK + altering columns in external systems like Snowflake or StarRocks" has been a widely adopted pattern for ETL pipelines. We can simply view the RisingWave target table as an external table.

Specific steps for this process:

  1. ALTER TABLE ADD COLUMN in the source table.
  2. DROP SINK and the sinking process will be temporarily paused.
  3. ALTER TABLE ADD COLUMN in the target table.
  4. CREATE SINK INTO with the additional column.
BugenZhao commented 1 month ago

Closed by https://github.com/risingwavelabs/risingwave/pull/17203