Discussion: Decouple cancel/drop mview from barrier

kwannoel commented 10 months ago

Support cancel of stream job without barrier.

Usually when high barrier latency, some stream job can be stuck, but due to high barrier latency, it cannot be cancelled either.
Another case: When high barrier latency for first barrier, stream job cannot be created, but cannot be cancelled either, until first barrier pass. If we can decouple cancel from barrier, we can avoid this.

Just noting this down first, didn't think of any ideas on how to support it yet.

kwannoel commented 9 months ago

We solved part of the problem, where on cancel, we will always remove the table fragment (without waiting for barrier), such that if there's a recovery triggered, the cluster will clean the stream job still.

kwannoel commented 7 months ago

IMO this will lead to a lot of friction for PoC users.

Having to trigger recovery is still high effort.

kwannoel commented 7 months ago

Offline discussion with @yezizp2012:

https://risingwave-labs.slack.com/archives/C0340PNL68P/p1701664338832329.

Highlight some parts:

Q: What about returning immediately e.g. background_cancel / background_drop mv. A: The job itself can be causing high latency, we want to immediately drop actors w/o waiting. It will could also consume lots of cpu / mem still.
Remark from @yezizp2012 :

The only way to solve this kind case is to force a recovery to clean and rebuild the whole streaming graph. But in a normal environment, neither DROP nor CANCEL requests are due to the existence of a problematic job in most cases, and I don’t think we should break the design of modifying the streaming graph via barriers just to be compatible with such one. Maybe we can add a force recovery interface for such cases?

In my experience, it is common for users to experience high latency when they're prototyping queries. It's very common to create a query with high join amplification. (Also we should migrate the join matched metrics to the user dashboard perhaps).

Adding a recovery interface could be OK. But users may feel uncomfortable having to trigger a recovery each time barrier latency becomes high. But this is lower effort it seems. So we could support it first.

I would still like to explore the feasibility of decouple drop / cancel from barrier.

BugenZhao commented 7 months ago

Also had a discussion with @fuyufjh earlier.

An intuitive approach could be directly dropping the tokio tasks of the involved actors when dropping a streaming job. However, from the view of the upstream actors, the closing of connections or channels during this procedure can be confused with a network failure, in which case the exception should be propagated instead. Thus, it's significant to exploit a mechanism that allows for graceful notification from the downstream to the upstream of the dropping. We are unsure yet if this will be easy to implement.

Besides decoupling with the barrier, given that the accomplishment of the last checkpoint for the streaming jobs to be dropped does not matter at all, another approach could be reordering the Stop barrier with any other data messages before in the same epoch. The idea could be similar to unaligned checkpoint or barrier stealing, but it may be more challenging to achieve.

kwannoel commented 7 months ago

Thus, it's significant to exploit a mechanism that allows for graceful notification from the downstream to the upstream of the dropping. We are unsure yet if this will be easy to implement.

I think we should introduce a separate notification channel per executor.

Other mechanisms such as altering executor configurations can also benefit and attach to that notification channel.

Currently we can only notify a node via message stream (essentially barrier messages). But in many cases we don't need to synchronize it on barrier.

BugenZhao commented 7 months ago

Other mechanisms such as altering executor configurations can also benefit and attach to that notification channel.

Currently we can only notify a node via message stream (essentially barrier messages). But in many cases we don't need to synchronize it on barrier.

Definitely +1 for this. 😄 Similar idea previously proposed at: https://github.com/risingwavelabs/risingwave/pull/13166#pullrequestreview-1709460290

BugenZhao commented 6 months ago

I think this could be somehow related to https://github.com/risingwavelabs/risingwave/issues/15490.

BugenZhao commented 6 months ago

I think we should introduce a separate notification channel per executor.

I believe we need a detailed design document for this. The concept of "separate notification channel", also known as "control channel" or "local message" previously, has been proposed many times but was always rejected due to implementation complexity or lack of strong motivation. 😄

kwannoel commented 6 months ago

I think we should introduce a separate notification channel per executor.

I believe we need a detailed design document for this. The concept of "separate notification channel", also known as "control channel" or "local message" previously, has been proposed many times but was always rejected due to implementation complexity or lack of strong motivation. 😄

Agree: https://github.com/risingwavelabs/rfcs/pull/81#issuecomment-2019522615

st1page commented 3 months ago

I think we should introduce a separate notification channel per executor.

I believe we need a detailed design document for this. The concept of "separate notification channel", also known as "control channel" or "local message" previously, has been proposed many times but was always rejected due to implementation complexity or lack of strong motivation. 😄

FYI: we used to implement a draft https://github.com/risingwavelabs/risingwave/pull/4834

risingwavelabs / risingwave

Discussion: Decouple cancel/drop mview from barrier #13396