Open dpc opened 4 months ago
On a machine with Ubuntu and nix
I had around with Linux 5.15.0-101-generic
, this works. On my two systems with new NixOS and Linux 6.9.2
it fails.
I have verified that downgrading to linux kernel 6.8.11 makes the problem go away.
Nix builds are kind of reproducible, so that's very unexpected
Nix does not take kernel version into account in its reproducibility guarantees.
Yes, everything else is locked in place (kind of). That's why I immediately suspected the kernel might be a problem.
So to sum up: something about very recent linux kernel version is breaking some assumptions in Rust standard code w.r.t forking/execing, which leads to this internal panic. It's hard for me to tell is it a kernel regression, or Rust's stdlib assumptions were incorrect, or maybe I'm missing something else entirely.
I am happening to witness it because I'm running as recent kernel version as NixOS can provide trying to avoid some bcachefs bugs. With time the problem might become more widespread.
Ah, a relatively small diff, then! Should be easy to find the offending commit. https://github.com/torvalds/linux/compare/f610c358956229b7e5180f8c1147725d989f6b0d...c8eef17
:thinking: , dozens of rebuild + reboot cycles... . I'll see if I can find a time to do it. No promises. :D
So I was recompiling a typescript nextjs project in a nix derivation, that previously worked (Nix builds are kind of reproducible, so that's very unexpected) and it failed with a weird error:
I suspect my kernel version might be different because I just upgraded to NixOS 24.05 recently.
I traced this panic to https://github.com/rust-lang/rust/blob/1689a5a531f1fe404944ed8c3ac6cb85a2cff7e0/library/std/src/sys/pal/unix/process/process_unix.rs#L125
I got a strace output:
I'm not sure where to even report it, and a bit tired to dig deeper. Creating the issue just for reference.
The whole thing can be reproduced with:
I'm going to try it on some machines and see when it fails and when works.