zesterer / flume

A safe and fast multi-producer, multi-consumer channel.
https://crates.io/crates/flume
Apache License 2.0
2.43k stars 85 forks source link

Document cancel safety #104

Open madadam opened 2 years ago

madadam commented 2 years ago

It's unclear whether Sender::send_async is cancel-safe. That is, does it guarantee that if the future returned from it is cancelled, the message is not sent? It would be great to document it. For inspiration, see tokio's docs: https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Sender.html#cancel-safety

Restioson commented 2 years ago

Notes toward documentation on this topic:

Send

Currently, upon Drop, the SendFut will remove the item from the queue of waiting senders, if it had not already been sent. There is no method to cancel the send future and return the item if it has not already been dropped, though. This is something that we could consider to be missing from the public API and could address here.

Receive


As a general design principle, we decided not to worry about users who leak or mem::forget futures. If I remember correctly, we considered these cases to be logic bugs in the program, and that the channel does not have to ensure messages are received or sent in these cases. Of course, flume should never cause undefined behavior regardless of any (safe) logic bugs, as it is #![deny(unsafe_code)].

I do not quite understand the cancellation safety doc in tokio, to be perfectly honest. It is worded as very specific to tokio::select!, so does this imply that another select combinator would not also preserve this behaviour?

madadam commented 2 years ago

If SendFut is polled and returns Poll::Pending, it may be received by a receiver at any point.

Bit confused about this. Does this mean that if SendFut is polled at least once and it returns Pending and then never polled again, the message might still be sent? Btw, I would not count that as cancellation. I think cancellation implies the future is dropped before it returned Ready.

As a general design principle, we decided not to worry about users who leak or mem::forget futures.

That sounds valid and reasonable to me.

I do not quite understand the cancellation safety doc in tokio, to be perfectly honest. It is worded as very specific to tokio::select!, so does this imply that another select combinator would not also preserve this behaviour?

I'm not a tokio dev, but my understanding is that they just use tokio::select as an example of cancellation not that the cancel-safety somehow depends on it.

Restioson commented 2 years ago

Does this mean that if SendFut is polled at least once and it returns Pending and then never polled again, the message might still be sent?

Yes, indeed. The item is sent into a queue of waiters. This is an optimisation for latency - the receiver doesn't need to wake the sender and then wait for the sender to wake, get polled, send the item, and then wake the receiver again. The receiver can just take the item and wake the sender. But, in some cases the sender could be dropped after item taken but before it is woken.

Restioson commented 2 years ago

By the way, could you elaborate on the usecase mentioned for cancellation safety in https://github.com/launchbadge/sqlx/issues/2054

madadam commented 2 years ago

Yes, indeed. ...

I see. So that means flume::Sender is not "cancel safe" in the same sense that tokio::mpsc::Sender is. That is, when the send_async future is cancelled (dropped), the message might still be sent. That is a bit unfortunate for my use case but I understand the performance justification.

could you elaborate on the usecase mentioned for cancellation safety in launchbadge/sqlx#2054

Sure. For the mentioned fix to work, I need to make sure that send_async does not send the message when it's cancelled. If that would not be the case (and it seems it indeed isn't), then what can happen is the BEGIN command can still be sent to the background thread before the RAII guard is created and so we could still end up with dangling transaction.

Restioson commented 2 years ago

Sure. For the mentioned fix to work, I need to make sure that send_async does not send the message when it's cancelled. If that would not be the case (and it seems it indeed isn't), then what can happen is the BEGIN command can still be sent to the background thread before the RAII guard is created and so we could still end up with dangling transaction.

Could the RAII guard be created before the send occurs? The worker thread would need to deal with the potential for spurious messages, however.

This is similar to how you usually create a guard which notifies if something happens before checking if it has actually happened, or you check twice - once before creation and once after. This is in case the notification fires after checking and before creating the listener, as then the listening guard may never be woken. I'm not sure if this makes sense, but here is an example of this pattern.

This may solve this particular issue (or maybe not!) but maybe looking at a cancellation-safe design would be good regardless, or for other use cases.

madadam commented 2 years ago

Could the RAII guard be created before the send occurs? The worker thread would need to deal with the potential for spurious messages, however.

Yes exactly. That could create the opposite problem - we could end up executing ROLLBACK without the corresponding BEGIN.

But there are other ways to solve this problem. I'm already pondering some ideas. Thanks for the input anyway!

Restioson commented 2 years ago

Yes exactly. That could create the opposite problem - we could end up executing ROLLBACK without the corresponding BEGIN.

In theory, maybe the worker can keep track of whether the corresponding BEGIN has been executed and ignore it if so, but this is getting a bit off topic now :) Good luck!

abonander commented 2 years ago

The item is sent into a queue of waiters. This is an optimisation for latency - the receiver doesn't need to wake the sender and then wait for the sender to wake, get polled, send the item, and then wake the receiver again.

How does this work with bounded channels? This sounds like the sender isn't woken until the item is received, but no one expects a channel to behave like that with nonzero capacity.

Restioson commented 2 years ago

Just to clarify, if the channel has capacity, send async will return in one poll. So, this behavior of waiting only applies to when there is no capacity in the channel.

When capacity is full, the sender goes into a queue of waiting senders, which stores its waker and the item. Then, when a receiver receives an item, there is now space in the channel, so the waiting sender is woken and the receiver puts that item on the channel. I think this was unclear in my original message, as I said "it may be received by a receiver at any point". This is better stated as it is pulled by a receiver from the pending queue into the channel's queue (an operation called pull_pending in the code).

By the way, please correct me if the code does not actually do this, but I think I'm reading it right here 🙂

no one expects a channel to behave like that with nonzero capacity.

It may seem slightly surprising at first, but I do not believe that this actually breaks any assumptions about bounded channels. The end result that the send will wait for capacity is the same.

Restioson commented 2 years ago

Oops sorry, that was an accidental close :sweat_smile:

zesterer commented 2 years ago

This is an interesting discussion and not something that has previous had much thought put into it. Given that we've pretty much "solved" (in the most primitive sense of the word) this issue on the receiving end for receiver slots, it might be viable to do something similar on the sending end, although I've not thought too much about that.

If anybody is interested in looking into this further, I'd be happy to review code / give advice, but I'm not sure I've got enough free time to work on it myself, at least for the next few months.

abonander commented 2 years ago

I think it's highly surprising that a send_async() operation can be cancelled but still go through, that could easily lead to some very subtle bugs. If nothing else, that needs to be clearly documented.

It also weakens the guarantees of having a bounded channel as, if I understand correctly, you could essentially do tx.send_async(item).now_or_never() in a loop to get unbounded behavior.

Restioson commented 2 years ago

It also weakens the guarantees of having a bounded channel as, if I understand correctly, you could essentially do tx.send_async(item).now_or_never() in a loop to get unbounded behavior.

This is not the case. When the future is dropped after now or never returns None, the waiter will be removed and the send will be canceled.

Restioson commented 2 years ago

I think it's highly surprising that a send_async() operation can be cancelled but still go through, that could easily lead to some very subtle bugs. If nothing else, that needs to be clearly documented.

I agree. It's something that should be documented or otherwise changed. As far as changing it goes, I may look into it in a successor to #84

oblique commented 1 year ago

From my understanding recv_async is cancel safe. Am I correct? Can we document this?