sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
926 stars 88 forks source link

segmentation fault in 1.0.6 on NixOS #927

Open JohnRTitor opened 16 hours ago

JohnRTitor commented 16 hours ago

Hi, I am the maintainer of scx packages for Nixpkgs.

I am trying to update it to 1.0.6, the packages compile fine, but whenever I try to run a rust scheduler binary like scx_rustland, the binary segfaults.

Here's what I could recover from gdb.

libbpf: prog 'rustland_init': insn #2 relocated, imm 120 points to subprog 'get_nr_online_cpus' (now at 123 offset)
libbpf: prog 'rustland_init': added 19 insns from sub-prog 'usersched_timer_fn'
libbpf: prog 'rustland_init': insn #103 relocated, imm 53 points to subprog 'usersched_timer_fn' (now at 157 offset)
libbpf: map 'cpu_ctx_stor': created successfully, fd=5
libbpf: map 'task_ctx_stor': created successfully, fd=6
libbpf: map 'queued': created successfully, fd=7
libbpf: map 'dispatched': created successfully, fd=8
libbpf: map 'pid_mm_fault_map': created successfully, fd=9
libbpf: map 'usersched_timer': created successfully, fd=10
libbpf: map 'bpf_bpf.rodata': created successfully, fd=11
libbpf: map '.data.uei_dump': created successfully, fd=12
libbpf: map 'bpf_bpf.data': created successfully, fd=13
libbpf: map 'bpf_bpf.bss': created successfully, fd=14
libbpf: map 'rustland': created successfully, fd=15

Thread 1 "scx_rustland" received signal SIGSEGV, Segmentation fault.
0x0000555555a6bf79 in bpf_object.attach_skeleton ()
(gdb)

On Nix, each rust scheduler have to be built separately, due to reproducibility and isolation requirements. We build all C schedulers in one package as well, then combine the rust and C schedulers into one big package.

I am not sure what is causing this, could it be because of bpftool? We do fetch the bpftool version defined in https://github.com/sched-ext/scx/blob/66223bf2350ec54d557ae9ef8b71a8c1c5d3d67e/meson.build#L187, and build it using the meson script. C schedulers use this method and working fine.

But for Rust schedulers, there's no REQUIREMENT that bpftool is needed, it just compiles without it fine. And that used to work for 1.0.5, functioning fine.

For rust-schedulers, we have elfutils, zlib, clang, libclang, and pkg-config as buildInputs. We do not add bpftool as it isn't a requirement. I tried to add bpftool from the official repo not the kernel source, but rust schedulers still does not work.

Directions to fix this would be helpful.


Another request I might add is: please Commit the cargo.locks to your repository and update them as needed. Due to reproducibility requirements, the Nix package manager have to check the output of the Cargo.lock and produce a hash, then it is compared with a hash we provide as maintainer. The hash matches and build succeeds, else build fails.

Packages without a Cargo.lock can not be built at all, we currently circumvent this by creating a Cargo.lock ourselves and copying it to Nixpkgs, which is a maintainer hassle.

CC @PedroHLC

ptr1337 commented 13 hours ago

On Nix, each rust scheduler have to be built separately, due to reproducibility and isolation requirements

Why? The arch package is currently also properly reproduceible with compiling all schedulers: https://archlinux.org/packages/extra-testing/x86_64/scx-scheds/ :

Reproducible Status: Good

PedroHLC commented 10 hours ago

@ptr1337 the problem is that Nix builds without internet. The Right way to integrate cargo on meson would be using "wrap" deps, like Mesa does, but SCX goes and calls cargo, which then turns to the internet for its dependencies.

But this is not the issue here, cargo itself already is very reproducible. We're just calling cargo ahead and when Meson needs it, we just handle the thing we built.

What @JohnRTitor is guessing is that the libbpf/bpftools, being used during the cargo building, is handling a different result, without skeleton features, but he's not providing libbpf/bpftools ourselves during cargo builds, cargo does it all.

I did not investigate this yet, so I wouldn't put my bet on John's guess for now.