implement `libc::sched_setaffinity` on linux

folkertdev commented 1 week ago

fixes https://github.com/rust-lang/miri/issues/2749

the implementation, like libc::sched_getaffinity, just always returns EINVAL, which kind of simulates a device with zero cpus. I believe the idea is that callers of this function always do it to optimize, so they are likely to gracefully recover from this function returning an error.

based on the libc crate, these functions are also available on android and freebsd (but not on macos or windows). So should the implementation of the sched_* functions just be copied to the android and freebsd shims?

RalfJung commented 6 days ago

The original motivation stated in the issue was a call site in nix-0.23.2/src/sched.rs:175. What does that code do when EINVAL is returned? Does this help Miri do anything useful in that code?

folkertdev commented 6 days ago

Allright, I dug into glommio's usage of these functions. for instance they have

#[test]
fn blocking_function_placement_independent_of_executor_placement() {
    let affinity = nix::sched::sched_getaffinity(nix::unistd::Pid::from_raw(0)).unwrap();

this test will just fail on miri because sched_getaffinity always returns EINVAL. The only benefit that you get from miri supporting this function at all is that the arguments to libc::sched_getaffinity are checked (e.g. if you pass in a NULL pointer miri would yell at you about that). The executor_pool_builder_placements test would also hit an unwrap on the result of sched_getaffinity, but it errors earlier on an unsupported syscall:

error: unsupported operation: can't execute syscall with ID 324
    --> glommio/src/sys/membarrier.rs:67:18
     |
67   |         unsafe { libc::syscall(libc::SYS_membarrier, cmd as libc::c_int, 0 as libc::c_int) }
     |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't execute syscall with ID 324

That's a separate issue though (no idea how miri should handle that).

Then sched_setaffinity. In glommio it is wrapped in this function

pub(crate) fn bind_to_cpu_set(cpus: impl IntoIterator<Item = usize>) -> Result<()> {
    let mut cpuset = nix::sched::CpuSet::new();
    for cpu in cpus {
        cpuset.set(cpu).map_err(|e| to_io_error!(e))?;
    }
    let pid = nix::unistd::Pid::from_raw(0);
    nix::sched::sched_setaffinity(pid, &cpuset).map_err(|e| Into::into(to_io_error!(e)))
}

which is tested here

#[test]
fn bind_to_cpu_set_range() {
    // libc supports cpu ids up to 1023 and will use the intersection of values
    // specified by the cpu mask and those present on the system
    // https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html#NOTES
    assert!(bind_to_cpu_set(vec![0, 1, 2, 3]).is_ok());
    assert!(bind_to_cpu_set(0..1024).is_ok());
    assert!(bind_to_cpu_set(0..1025).is_err());
}

So this will just fail with miri. Based on the man page this will also fail in general (i.e. even without miri) when

The affinity bit mask mask contains no processors that are currently physically on the system and permitted to the thread according to any restrictions that may be imposed by cpuset cgroups or the "cpuset" mechanism described in cpuset(7).

but this test seems to just assume that that did not happen, and up to 1024 cpus can be configured for the current thread.

I think in general that means that the implementations of {g, s}et_affinity are not that useful for glommio at least, besided checking some safety properties on the arguments.

I'm not sure how miri could do better though. It would have to keep track of the affinity (1024 bits per thread id, with pid 0 mapping to the current thread) in some piece of global state?

RalfJung commented 6 days ago

We can have a simple-but-wrong implementation that happily says "ok" on setaffinity but just ignores this and returns the full mask on getaffinity. Or we track the per-thread mask in some per-thread state, so that getaffinity can return the right result. Either way it would not have any other effect. I don't have a strong preference either way, but maybe some apps get confused when their setaffinity is not reflected in getaffinity.

based on the libc crate, these functions are also available on android and freebsd (but not on macos or windows). So should the implementation of the sched_* functions just be copied to the android and freebsd shims?

No copy-paste please. You can do it like reallocarray -- have the impl in the generic unix file and guard it on what the concrete OS is.

folkertdev commented 6 days ago

are there examples of such per-thread state?

I don't think the simple-but-wrong approach is really more useful than the status quo.

RalfJung commented 6 days ago

I'd suggest you add a new FxHashMap<ThreadId, ...> to MiriMachine.

folkertdev commented 6 days ago

@rustbot ready

allright, the implementation now simulates the logic in that setting and then getting the mask returns the updated value. the initial value of the mask is based on -Zmiri-num-cpus.

folkertdev commented 6 days ago

apparently for some targets no CPUs are configured? specifically the s390x-unknown-linux-gnu target fails when executed on macos?!

  Error: 
     0: ui tests in tests/pass-dep for s390x-unknown-linux-gnu failed
     1: tests failed

  full stderr:
  thread 'main' panicked at tests/pass-dep/libc/libc-misc.rs:80:5:
  assertion failed: unsafe { libc::CPU_ISSET(0, &cpuset) }
  stack backtrace:
     0: std::panicking::begin_panic_handler
               at /Users/runner/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/panicking.rs:661:5
     1: core::panicking::panic_fmt
               at /Users/runner/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/panicking.rs:74:14
     2: core::panicking::panic
               at /Users/runner/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/panicking.rs:148:5
     3: test_affinity
               at tests/pass-dep/libc/libc-misc.rs:80:5
     4: main
               at tests/pass-dep/libc/libc-misc.rs:141:9
     5: <fn() as std::ops::FnOnce<()>>::call_once - shim(fn())
               at /Users/runner/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

https://github.com/rust-lang/miri/actions/runs/9627551162/job/26554759066?pr=3698

we can just skip the test on this platform but it is weird.

RalfJung commented 5 days ago

s390x is a big-endian architecture, so likely your code is simply not correct for big-endian systems.

RalfJung commented 4 days ago

Instead of repeating the same cfg over and over, please move these "use" inside the function.

RalfJung commented 4 days ago

Oh wait the entire test is only for some targets? Then please use "ignore" to skip this on other targets. No cfg required.

folkertdev commented 4 days ago

does

//@ignore-target-windows: only very limited libc on Windows
//@ignore-target-apple: `sched_setaffinity` is not supported on macOS
//@ignore-target-wasm32

let through the same targets as

    #[cfg(any(target_os = "linux", target_os = "freebsd", target_os = "android"))]

it's close but there's probably something I'm missing

RalfJung commented 4 days ago

You don't need the wasm line, we only run very few tests on wasm as it is not fully supported by Miri.

folkertdev commented 3 days ago

@rustbot ready

I just added input validation (on NULL pointers and invalid thread IDs). I believe the implementation now covers all of the behavior that we want.

rust-lang / miri

implement `libc::sched_setaffinity` on linux #3698