Open jonhoo opened 4 years ago
It looks to me like the lock is held for the minimum and correct span (e.g. not including actually running the blocking operation closure). In my own testing of an alternative thread pool (in blocking-permit crate), use of (parking_lot) mutex lock and condvar with minimized notifications appears to give the best possible performance, at least on linux.
What if anything is suggested as an alternative?
The solution here would be to use a lockless queue, e.g. crossbeam_deque. @Darksonn On that note, why was crossbeam stuff purged from tokio? Every tokio runtime's injector uses locks which seems like a clear downgrade from a lockless queue.
I've seen this causing perf issues before on code which does a ton of file io.
I've spent a bit of time thinking about what the optimal solution could look like:
Tasks are pushed to the SegQueue. Active threads poll the queue, twice if thread_cap
has not been reached yet. If the second poll returns a value, start a new thread and give it that second task (putting it in the slab). If the queue is empty, the thread puts itself in the idle thread queue and parks itself with a timeout. New tasks can take a thread out of the idle queue and notify it (assuming it's not dead, otherwise remove from slab and loop). On notify or timeout, a thread checks to see if there's work and lets itself die if not, marking itself as dead. On shutdown, notify all the idle threads and then join everything in the slab.
We generally want to avoid adding new dependencies if we can at all avoid it.
One nice thing about the injector queue is that --- unlike run queues --- tasks cannot be dropped from outside of the queue while in the injector queue. This means that we may be able to use an approach like Vyukov's intrusive singly-linked MPSC queue for injector queues --- this is something I've wanted to look into for a while, actually.
Don't we need MPMC though?
I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing spawn_blocking
with block_in_place
actually improved things in these cases (obviously, with larger reads block_in_place
was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS)
So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.
I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing
spawn_blocking
withblock_in_place
actually improved things in these cases (obviously, with larger readsblock_in_place
was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS) So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.
block_in_place
actually does do basically a spawn_blocking call to obtain a thread for a new worker, so this is interesting.
Code that invokes
spawn_blocking
, orblock_in_place
(which internally callsspawn_blocking
), all end up taking a shared lock here: https://github.com/tokio-rs/tokio/blob/221f421464e6672e9cf25629998477946a8b8e1f/tokio/src/runtime/blocking/pool.rs#L153This causes a lot of unnecessary contention between unrelated tasks if you frequently need to run blocking code. The problem is exacerbated as load increases, because more calls to
spawn_blocking
causes each individual call to become more expensive, and each call holds up an executor thread.