microsoft / snmalloc

Message passing based allocator
MIT License
1.57k stars 108 forks source link

CI breakage: ppc64le-linux cross-emulation environment no longer works #576

Open nwf-msr opened 1 year ago

nwf-msr commented 1 year ago

Presumably because the ubuntu-latest runners have stepped forward (as with #575), the powerpc64el cross-build and -run test is failing an apparently random subset of the tests. I am unable to reproduce these crashes on my Power machine, so I am inclined to think it's an artifact of emulation. With a little elbow grease and prodding, I can reproduce it on WSL2. It looks like we aren't making it very far into program startup... with qemu tracing its heart out (and some judicious editing of the resulting 60MB log), we see that the signal is

--- SIGSEGV {si_signo=SIGSEGV, si_code=1, si_addr=0x0000004001be5000} ---

and the program counter is presumably near the last TB we entered, which was

exec_tb tb:[...] pc=0x4001e2f724

si_code=1 is SEGV_MAPERR ("address not mapped to object").

The fault address 0x4001be5000 is within the dynamic linker's load of libsdtdc++

openat(AT_FDCWD,"/usr/powerpc64le-linux-gnu/lib/libstdc++.so.6",O_RDONLY|O_CLOEXEC) = 4
[...]
mmap(0x0000004001bd0000,131072,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,4,0x2f0000) = 0x4001bd0000

and is not in the range of any subsequent mprotect call. The PC 0x4001e2f724 is within libc:

openat(AT_FDCWD,"/usr/powerpc64le-linux-gnu/lib/libc.so.6",O_RDONLY|O_CLOEXEC) = 4
[...]
mmap(NULL,2417600,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,4,0) = 0x4001d60000

That trace certainly suggests that there should be memory at 0x4001be5000, I think.

qemu v4.2.1, approximately what shipped in Ubuntu Focal, lets the test pass. I'll bisect qemu and report back.

nwf-msr commented 1 year ago

Bisection points at this being the fault of https://github.com/qemu/qemu/commit/4dcf078f094d436866ef793aa25c96fba85ac8d0 . The first release to contain that commit was v5.0.0, putting it after Ununtu Focal and before Impish (and so Jammy). I don't understand why that change would trigger this behavior, but so it goes.

For history, I used this somewhat awkward command to build, since qemu has changed their build system and output layout a few times in the large span between v4.2.1 and today:

(rm -rf _build; mkdir _build; cd _build; ../configure --target-list=ppc64le-linux-user --disable-werror --disable-docs; ninja || make -j5; ln -s ppc64le-linux-user/qemu-ppc64le .)
nwf-msr commented 1 year ago

Reported to qemu at https://gitlab.com/qemu-project/qemu/-/issues/1361

mjp41 commented 1 month ago

I think this is resolved?