tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
27.08k stars 2.49k forks source link

Runtime builder panics when unable to spawn worker threads #6167

Open xiaoluzi0050 opened 12 months ago

xiaoluzi0050 commented 12 months ago

Version 1.28.2

Platform linux ubuntu18.04

Description Using sccache to improve compilation speed, but encountering an error with the third-party library tokio.

thread 'main' panicked at 'failed to spawn thread: Os {code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/jobserver.rs:53:18 thread 'main' panicked at 'OS can't pawn worker thread:Resource temporarily unavailable (os error 11)', /home/.cargo/registry/src/github.com/tokio-1.28.2/src/runtime/scheduler/multi_thread/worker.rs:365:13 make[2]: fork: Resource temporarily unavailable

sccache version : 0.7.1 The server has 256 cores, with a parallelism setting of 100, and then compile in Docker. The aforementioned issue occurs sporadically, and the probability of occurrence increases as the parallelism value is set higher The server resources are abundant, such as memory and maximum thread count.

[short summary of the bug] runtime::spawn_blocking error, Resource temporarily unavailable

Darksonn commented 12 months ago

Please file this bug with sccache instead. There isn't anything Tokio can do if you ask it to spawn N threads, and it isn't able to spawn N threads.

As an aside, it seems like the same failure also happened elsewhere in sccache in jobserver.rs, so even if Tokio was able to gracefully handle this, sccache would still have failed.

haydenflinner commented 9 months ago

I haven't dug into the code here, just trying to resolve this issue for sccache because it's still happening. So if there's another API that would solve the below already exposed, would appreciate a link!

Is there an API that can be used which returns Results instead of panicking? Or warns but doesn't panic when some fail to be created?

Or getting closer to the actual fix, given that the err is Resource temporarily unavailable, that sounds like errno EAGAIN to me, which means that simply retrying is likely to be enough. If tokio doesn't want to take on the responsibility of retrying, then it at least needs to expose the machinery to the user to do so themselves. In theory most syscalls can fail at any time with EAGAIN, simply because a signal interrupted the normal handling of the syscall.

Darksonn commented 9 months ago

Can you confirm that this is a problem that happens in the runtime builder specifically, and not as part of some other thread-spawning mechanism such as spawn_blocking?

We can most likely add a try_build or similar to the runtime builder that returns a Result if capturing the panic doesn't work for you.