tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
26.97k stars 2.48k forks source link

Heavy contention on blocking #2528

Open jonhoo opened 4 years ago

jonhoo commented 4 years ago

Code that invokes spawn_blocking, or block_in_place (which internally calls spawn_blocking), all end up taking a shared lock here: https://github.com/tokio-rs/tokio/blob/221f421464e6672e9cf25629998477946a8b8e1f/tokio/src/runtime/blocking/pool.rs#L153

This causes a lot of unnecessary contention between unrelated tasks if you frequently need to run blocking code. The problem is exacerbated as load increases, because more calls to spawn_blocking causes each individual call to become more expensive, and each call holds up an executor thread.

dekellum commented 3 years ago

It looks to me like the lock is held for the minimum and correct span (e.g. not including actually running the blocking operation closure). In my own testing of an alternative thread pool (in blocking-permit crate), use of (parking_lot) mutex lock and condvar with minimized notifications appears to give the best possible performance, at least on linux.

What if anything is suggested as an alternative?

SUPERCILEX commented 2 years ago

The solution here would be to use a lockless queue, e.g. crossbeam_deque. @Darksonn On that note, why was crossbeam stuff purged from tokio? Every tokio runtime's injector uses locks which seems like a clear downgrade from a lockless queue.

Noah-Kennedy commented 2 years ago

I've seen this causing perf issues before on code which does a ton of file io.

SUPERCILEX commented 2 years ago

I've spent a bit of time thinking about what the optimal solution could look like:

Tasks are pushed to the SegQueue. Active threads poll the queue, twice if thread_cap has not been reached yet. If the second poll returns a value, start a new thread and give it that second task (putting it in the slab). If the queue is empty, the thread puts itself in the idle thread queue and parks itself with a timeout. New tasks can take a thread out of the idle queue and notify it (assuming it's not dead, otherwise remove from slab and loop). On notify or timeout, a thread checks to see if there's work and lets itself die if not, marking itself as dead. On shutdown, notify all the idle threads and then join everything in the slab.

Darksonn commented 2 years ago

We generally want to avoid adding new dependencies if we can at all avoid it.

hawkw commented 2 years ago

One nice thing about the injector queue is that --- unlike run queues --- tasks cannot be dropped from outside of the queue while in the injector queue. This means that we may be able to use an approach like Vyukov's intrusive singly-linked MPSC queue for injector queues --- this is something I've wanted to look into for a while, actually.

SUPERCILEX commented 2 years ago

Don't we need MPMC though?

tfreiberg-fastly commented 1 year ago

I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing spawn_blocking with block_in_place actually improved things in these cases (obviously, with larger reads block_in_place was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS) So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.

Noah-Kennedy commented 1 year ago

I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing spawn_blocking with block_in_place actually improved things in these cases (obviously, with larger reads block_in_place was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS) So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.

block_in_place actually does do basically a spawn_blocking call to obtain a thread for a new worker, so this is interesting.