Open StrikeW opened 1 month ago
so why not directly drop the sink?
so why not directly drop the sink?
If we sink a MV out, when recreate the sink we need to backfill historical data again to ensure all data go out to downstream. Backfill historical data will deliver data already in downstream and introduce unnecessary write traffic to downstream.
so why not directly drop the sink?
If we sink a MV out, when recreate the sink we need to backfill historical data again to ensure all data go out to downstream. Backfill historical data will deliver data already in downstream and introduce unnecessary write traffic to downstream.
We support a snapshot
option in CREATE SINK
: https://docs.risingwave.com/docs/current/sql-create-sink/
When snapshot=false
, the backfilling will be skipped.
Dropping the sink and recreate it with snapshot=false
can cause the data ingested between DROP SINK and CRETE SINK to be missing in the sink though.
I can think of one case this feature is useful, which is when user's downstream system is overloaded and want to stop the traffic for a while before they can fix the downstream. Not sure whether it is valid though. If it is just a temporarily downstream failure, the current retry with backoff mechanism with sink decouple on seems good enough.
Dropping the sink and recreate it with snapshot=false can cause the data ingested between DROP SINK and CRETE SINK to be missing in the sink though.
That's it. To ensure at-least once delivery, backfilling is still needed.
I can think of one case this feature is useful, which is when user's downstream system is overloaded and want to stop the traffic for a while before they can fix the downstream.
Yes. We encounter a case that a user sinking a MV with large data into downstream PG but there is some issues with the PG cdc source they want to troubleshoot.
sink_decuple
, or that fact that there is a hidden log store before the sink, is somehow a bit counter-intuitive from the user's perspective. Thus, I would like to keep it transparent to users in normal cases. Only when the sink can't work, the hidden log store helps to avoid the failure of all streaming jobs.
Applying this idea here, I think we may do this in risectl
in case of contingency. User won't be aware of this except troubleshooting.
Is your feature request related to a problem? Please describe.
Usecase: sometimes user may want to prevent data coming out from RW but don't want to pause the entire cluster.
We don’t provide a way for user to just pause a Sink right now, I think for the Sink with
sink_decuple = true
it is feasible to provide a way to pause a Sink job from sql.Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response