risingwavelabs / risingwave

SQL engine for event-driven workloads. Perform streaming analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch processing. PostgreSQL compatible.
https://www.risingwave.com/slack
Apache License 2.0
6.71k stars 552 forks source link

bug: potential loss of data in sink into table #17602

Open wenym1 opened 1 month ago

wenym1 commented 1 month ago

Describe the bug

Currently, when we handle the SINK INTO TABLE statement, we run two commands atomically: a CreateStreamingJob to create a streaming graph to generate the new input, and a ReplaceTable to recreate the the downstream streaming graph that writes to the table.

When handling ReplaceTable, we drop all the actors that write the downstream table, and recreate new ones to accept the new input. However, after dropping the original actors, we don't wait for the uncommitted data to be committed. Since the data are uncommitted, it is not visible to the newly created actors. Therefore, when the new actors try to read keys that has uncommitted data, it won't read the correct one, which may cause data inconsistency. On the other hand, for ReplaceTable command that handles ALTER TABLE statement, it will treat it as a configuration, and will wait for all data to be committed before dropping and recreating the actors, and therefore won't cause bug.

Potential solution:

  1. When handling SINK INTO TABLE statement, treat it as configuration change and run pause/resume command before and after the command.
  2. Do not run ReplaceTable when handling SINK INTO TABLE. Instead, modify the inputs of downstream streaming graph on the flight. We can leverage the mutation information to ensure that the inputs configuration is changed at the boundary of barrier.

Error message/log

Found by reading code.

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

wenym1 commented 1 month ago

cc @shanicky