Open jethrogb opened 6 years ago
In rr we automatically retry Linux clone()
syscalls when we see EAGAIN
. If we don't do that, tests fail under load; when we do do it, those tests pass. See https://github.com/mozilla/rr/commit/68bd393098afe7535d3561eae4eff6e3e9038096
So I think it's a good idea to automatically retry clone()
on EAGAIN
.
So I think it's a good idea to automatically retry clone() on EAGAIN.
The only thing we currently do this for widely is EINTR
AFAIK as implications of doing so are well understood. I’m not sure this is true for clone
's EAGAIN.
If the process has reached its limit on how many threads it is allowed to have, it does not seem wise to just hot-spin trying to make a new one forever.
I observed this error when I was much below the limit of threads, but I was creating them very quickly.
fn go() {
std::thread::sleep(std::time::Duration::from_millis(10000));
}
fn main() {
let mut cnt = 0;
loop {
match std::thread::Builder::new().spawn(go) {
Ok(_) => cnt += 1,
Err(e) => {
println!("error: {:?} {:?}", e.kind(), e);
println!("cnt {}", cnt);
return
}
}
}
}
results in
rg@rg-2018:temp$ ./a
error: WouldBlock Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
cnt 9919
rg@rg-2018:temp$ cat /proc/sys/kernel/threads-max
126588
If the process has reached its limit on how many threads it is allowed to have, it does not seem wise to just hot-spin trying to make a new one forever.
True. Backing off for a second would probably be fine.
I've only tried under a VM but fwiw I'm not able to reproduce thread::spawn
/clone
transiently failing on Linux.
This on my system reaches the thread limit reliably without transient failures. Transient failures would cause wouldblock
and progress
to be repeatedly printed, but I have not seen that.
use std::{
io::ErrorKind,
mem::forget,
thread::{sleep, Builder},
time::Duration,
};
fn idle() {
loop {
sleep(Duration::from_secs(60));
}
}
fn main() {
let mut threads = 0;
let mut stuck = false;
loop {
match Builder::new().stack_size(4096).spawn(idle) {
Ok(handle) => {
threads += 1;
if stuck {
println!("progress after {}", threads);
}
stuck = false;
forget(handle);
}
Err(ref err) if err.kind() == ErrorKind::WouldBlock => {
if !stuck {
println!("wouldblock after {}", threads);
}
stuck = true;
sleep(Duration::from_millis(100));
}
Err(e) => {
panic!("{}: {:?}", threads, e.kind());
}
}
}
}
It's possible that thread limits are lower than people are expecting. In particular, /proc/sys/kernel/threads-max
is likely not the maximum if systemd is installed (in which case see systemctl status user-$UID.slice
).
My GitHub Actions workflow often fails because libtest failed to spawn new threads.
I believe the failures is unrelated to the code I tested. I have to use cargo test -- --test-threads 1
now. :sob:
There are three separate issues mentioned here:
spawn
is shown as 'WouldBlock' in Debug
output, instead of EAGAIN
.I would like to have this issue only track the third problem.
Renamed issue to reflect (3).
Assigning P-low
as discussed as part of the Prioritization Working Group procedure and removing I-prioritize
.
EAGAIN
and EWOULDBLOCK
differ only on Windows, Redox, and VxWorks, according to our libc crate. We could add an io::Error::TryAgain
variant and make sure we return that, but is it worth changing that return value and potentially confounding users?
When trying to launch a thread and the thread limit is reached or there is not enough virtual address space available for another thread,
thread::Builder::spawn
returns anio::Error
of kindWouldBlock
.This prints (on Linux):
WouldBlock means:
This doesn't make a lot of sense in the context of thread creation. Yes, if the create call were to block until the thread/virtual address space limit is no longer reached, this error interpretation would be correct, but I know of no threading API (Windows or Linux) with these semantics.
The source of the problem is that the POSIX errors
EAGAIN
andEWOULDBLOCK
may be defined as the same error value, and Rust chose to always interpret that asEWOULDBLOCK
. I'm not sure what course of action I'd suggest to clear up the confusion.(NB. On Windows, AFAICT there is no way to limit the number of threads, but when running out of virtual address space,
CreateThread
returnsERROR_NOT_ENOUGH_MEMORY
, which gets decoded as kindOther
)