rust-lang / futures-rs

Zero-cost asynchronous programming in Rust
https://rust-lang.github.io/futures-rs/
Apache License 2.0
5.36k stars 620 forks source link

All multiplexing futures are potentially broken by tokios preemting-cooperative feature. #2553

Open rustonaut opened 2 years ago

rustonaut commented 2 years ago

This is not limited to FuturesUnordered.

TL;DR: Tokio futures subtle brakes not just FuturesUnordered, but also join_all, various joinX and maybe other combinators.

Relate: #2387 ,

The new feature

Tokio added some form of preempting to the cooperative multi-threading rust futures do (and confusingly called it "cooperative scheduling").

This feature allocates a budged of "tracked poll" calls to each task (i.e. future) scheduled on the tokio runtime and once a "tracked poll" call is entered while the budget is exceeded it will preempt the task (i.e. forceful yields without actually polling). Every time a "tracked poll" is entered the budget is reduced by one, but if the poll call does not make progress the budget is restored. Most (all?) task-leaf futures provided by tokio use "tracked polls", like oneshot receivers, io, task join handles etc..

The idea behind this is that a futures can make so much repeated progress that it might starve other futures. Like e.g. you awaiting next elements on a stream and only yield when the stream is not ready to produce any new elements.

This is generally a good idea, as it's easy to make this mistake. Furthermore in-place workarounds like yielding after each time a item is returned, or per-hand putting in a limiter tend to be more complex and potentially less performant (depending on the situations, the tokio feature doesn't just catch single loops, but also complex situations which idk. recursive boxed futures).

As a side note the functionality to do "tracked polls" is currently not exposed by tokio. A way to opt. out of it for a future and all nested futures of it is provided, but no way to opt-in again. Which due to the nesting aspect you likely do not want to use as people using tokio might very well expect the preempting to exist.

The problem

The feature in it's design assumes that on task (i.e. future scheduled in the tokio runtime for this discussion) only has a single logical thread of execution and as such many "progress making" "tracked poll" calls must mean some form of "forward progressing" looping is done, which is best to forcefully yield.

But since the beginning a single future in rust (and as such tasks) do not map to a single logical thread of execution, features like join, race, FuturesUnordered etc. allow forking a single future into multiple concurrent (but not parallel) logical threads of execution.

As such any future which multiplexes over multiple futures is not treated adequately (IMHO) and is at risk of leading to serve performance degradation if unlucky (it tends to not lead to starving futures as "tracked poll" calls tend to only be in futures provided by tokio which tend to be the leafs for given task, at least during the time they are awaited, like awaiting a oneshot or io, if tokio would put it into e.g. select it would be at risk of starving futures).

The reason for this is that all multiplexed futures share the same budged and as such the first polling in the multiplexing future can already consume all the budged leading to all future calls to preempt.

This means that while a future yielding using yield_now does not guarantee that it returns to the scheduler a future yielding due to preemption must returns to the scheduler, or else no progress will be made (by futures using "tracked poll" calls).

As such all multiplexing futures must follow two rules (I think) which can make them noticeable more complex:

Furthermore even iff you do all of this there is still one problem. The multiplexed futures still share a budget and as such are slightly de-prioritized compared to any future spawned and handled by the same executor thread. Most time that shouldn't be an issue but if you are very very unlucky it could somehow subtle degrade performance.

Lastly I will note that spawn could avoid that problem but is often not an option mainly due to lifetime issues. It also introduces a bunch of overhead and reduces sometimes needed control over the sub-futures. It also requires more runtime specific code.

What needs to be done

Worries

I'm quite worried that this will further divide the rust async ecosystem due to:

I'm also worried that even just inside of tokio this can lead to surprising "magical" failures when e.g. somewhere in a future stack from a library it opt-out (or stops opting out) of preemting. This experience could turn new people away from rust and is kinda exactly the kind of "surprising failure" rust is normally trying to not have. But so is the error case preempting tires to fix, so preferably a way to handle multiplexing futures with preempting independent of the runtime should exists.

rustonaut commented 2 years ago

A way to handle multiplexing futures

A simple way to handle multiplexing futures would be to have a way to snapshot and restore the budget.

When you enter the multiplexing futures poll you snapshot the budget and after every sub-futures poll you restore it (1).

I believe every multiplexed sub-future has the right to be polled at least once(2) (in above's case exactly once only) per poll of the multiplexing future in the same way a schedulers thread tends to poll all (woken) futures once per epoch (it's a bit oversimplified, but I think the argument still holds).

Given that the budget is pretty much just a thread local cell, this functionality could be moved into a small independent crate (e.g. in the rust nursery) which would make it work runtime independent.


(1): Even better would be to have the snapshot/restore mechanism but also keep track if the combined used budged across all sub-futures exceeds the remaining budget. This could in some situations allow slightly more performant patterns when only multiplexing over a few futures. Like if you receive futures from a stream and then multiplex over them and the stream is constantly ready.

(2): It's a bit over simplified as you stack a lot of futures on the same executing thread without giving them a chance to be work-stolen, so you probably still want some total upper limit. Just a much higher one. Maybe instead of snapshot and restore so having a "once per task poll high budget increase" could be an alternative implementation.

rustonaut commented 2 years ago

sorry for writing to many words :disappointed:

rustonaut commented 2 years ago

Given that join always joins all, and the fact that "checked pools" only appear in leaf position we might not need to fix join. Through I'm not completely sure.

rustonaut commented 2 years ago

I made a small mistake: I said it doesn't starve leaf futures, which is true but not as good as I though as it will still prevent the leaf-future from progressing and the future we poll (e.g. in join) might go through multiple leaf-futures before completing. If multiple multiplexed futures also interact, e.g. through channels, this could probably cause an dead lock like situation. So we probably need to fix all join futures and also probably need to fix FuturesUnordered.