risingwavelabs / risingwave

SQL stream processing, analytics, and management. We decouple storage and compute to offer efficient joins, instant failover, dynamic scaling, speedy bootstrapping, and concurrent query serving.
https://www.risingwave.com/slack
Apache License 2.0
6.65k stars 545 forks source link

Support pausing a decoupled Sink from SQL #17357

Open StrikeW opened 1 month ago

StrikeW commented 1 month ago

Is your feature request related to a problem? Please describe.

Usecase: sometimes user may want to prevent data coming out from RW but don't want to pause the entire cluster.

We don’t provide a way for user to just pause a Sink right now, I think for the Sink with sink_decuple = true it is feasible to provide a way to pause a Sink job from sql.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

tabVersion commented 1 month ago

so why not directly drop the sink?

StrikeW commented 1 month ago

so why not directly drop the sink?

If we sink a MV out, when recreate the sink we need to backfill historical data again to ensure all data go out to downstream. Backfill historical data will deliver data already in downstream and introduce unnecessary write traffic to downstream.

hzxa21 commented 1 month ago

so why not directly drop the sink?

If we sink a MV out, when recreate the sink we need to backfill historical data again to ensure all data go out to downstream. Backfill historical data will deliver data already in downstream and introduce unnecessary write traffic to downstream.

We support a snapshot option in CREATE SINK: https://docs.risingwave.com/docs/current/sql-create-sink/

When snapshot=false, the backfilling will be skipped.

hzxa21 commented 1 month ago

Dropping the sink and recreate it with snapshot=false can cause the data ingested between DROP SINK and CRETE SINK to be missing in the sink though.

I can think of one case this feature is useful, which is when user's downstream system is overloaded and want to stop the traffic for a while before they can fix the downstream. Not sure whether it is valid though. If it is just a temporarily downstream failure, the current retry with backoff mechanism with sink decouple on seems good enough.

StrikeW commented 1 month ago

Dropping the sink and recreate it with snapshot=false can cause the data ingested between DROP SINK and CRETE SINK to be missing in the sink though.

That's it. To ensure at-least once delivery, backfilling is still needed.

I can think of one case this feature is useful, which is when user's downstream system is overloaded and want to stop the traffic for a while before they can fix the downstream.

Yes. We encounter a case that a user sinking a MV with large data into downstream PG but there is some issues with the PG cdc source they want to troubleshoot.

fuyufjh commented 1 month ago

sink_decuple, or that fact that there is a hidden log store before the sink, is somehow a bit counter-intuitive from the user's perspective. Thus, I would like to keep it transparent to users in normal cases. Only when the sink can't work, the hidden log store helps to avoid the failure of all streaming jobs.

Applying this idea here, I think we may do this in risectl in case of contingency. User won't be aware of this except troubleshooting.