rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
94.69k stars 12.2k forks source link

Segmentation fault installing rust on clean Ubuntu 24.04 amd64 #125430

Open jaraco opened 1 month ago

jaraco commented 1 month ago

I encountered this while rebuilding jaraco/multipy-tox on Linux AMD64.

More simply replicated using Docker Desktop on macOS 14.5 on Apple Silicon:

docker run --platform linux/amd64 ubuntu:noble bash -c "apt update; apt install -y wget; wget https://sh.rustup.rs -O - | sh -s -- -y"

The command fails with this in the output:

...
 14450K .......... .......... .......... .......... .......... 99%  106M 0s
 14500K .......... .......... .......... ..                   100% 9.59M=0.4s

2024-05-23 00:52:51 (32.7 MB/s) - '/tmp/tmp.0v1U3NptmZ/rustup-init' saved [14881096/14881096]
info: profile set to 'default'
info: default host triple is x86_64-unknown-linux-gnu
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
info: latest update on 2024-05-02, rust version 1.78.0 (9b00956e5 2024-04-29)
info: downloading component 'cargo'
info: downloading component 'clippy'
info: downloading component 'rust-docs'
info: downloading component 'rust-std'
info: downloading component 'rustc'
info: downloading component 'rustfmt'
info: installing component 'cargo'
info: installing component 'clippy'
info: installing component 'rust-docs'
Segmentation fault

If I run the same command without --platform linux/amd64 (or pass --platform linux/arm64), the command completes, so it seems something about building on Linux AMD64 under virtualization causes the install to crash.

This same routine worked fine about three weeks ago.

saethlin commented 1 month ago

Wow. Something is really messed up here. Trying to investigate in lldb and I'm already lost:

(lldb) run default stable
Process 3528 launched: '/root/.cargo/bin/rustup' (x86_64)
Process 3528 stopped
* thread #1, name = 'rustup', stop reason = signal SIGSTOP
    frame #0: 0xffffffffffffffff 
(lldb) c
Process 3528 resuming
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
info: latest update on 2024-05-02, rust version 1.78.0 (9b00956e5 2024-04-29)
info: downloading component 'cargo'
info: downloading component 'clippy'
info: downloading component 'rust-docs'
info: downloading component 'rust-std'
info: downloading component 'rustc'
info: downloading component 'rustfmt'
info: installing component 'cargo'
info: installing component 'clippy'
info: installing component 'rust-docs'
Process 3528 stopped
* thread #1, name = 'rustup', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x20)
    frame #0: 0xffffffffffffffff 
saethlin commented 1 month ago

The segfault occurs during installation of docs. You can get a working toolchain if you use the minimal profile instead of the default.

The segfault during rust-docs installation goes away if I run rustup under strace or valgrind.gdb crashes:

Starting program: /root/.cargo/bin/rustup default stable
warning: linux_ptrace_test_ret_to_nx: Cannot PTRACE_GETREGS: Input/output error
warning: linux_ptrace_test_ret_to_nx: PC 0x900000000 is neither near return address 0x7ffff8f1a000 nor is the return instruction 0x555555965971!
Couldn't get CS register: Input/output error.

And rr also crashes:

root@533d5273fa2a:~# rr record rustup default stable
rr: Saving execution to trace directory `/root/.local/share/rr/rustup-0'.
[FATAL ./src/Task.cc:3207:ptrace_if_stopped() errno: EIO] 
 (task 4113 (rec:4113) at time 1)
 -> Assertion `!errno' failed to hold. ptrace(PTRACE_GETREGS, 4113, addr=0, data=0x7ffffffc4700) failed with errno 5
Tail of trace dump:
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x5e)[0x555555715e2e]
rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x161)[0x555555601b01]
rr(+0xbf416)[0x555555613416]
rr(+0xc00ec)[0x5555556140ec]
rr(_ZN2rr4Task17ptrace_if_stoppedEiNS_10remote_ptrIvEEPv+0x115)[0x5555556eb0c5]
rr(_ZN2rr4Task11did_waitpidENS_10WaitStatusE+0x58f)[0x5555556ef2cf]
rr(_ZN2rr4Task5spawnERNS_7SessionERNS_8ScopedFdEPS3_S5_PiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorISC_SaISC_EESJ_i+0x830)[0x5555556f2d80]
rr(_ZN2rr13RecordSessionC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS6_SaIS6_EESD_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEiNS_7BindCPUES8_PKNS_9TraceUuidEbb+0x2e1)[0x555555643751]
rr(_ZN2rr13RecordSession6createERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EESB_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEhNS_7BindCPUERKS7_PKNS_9TraceUuidEbbbb+0xa91)[0x555555644921]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x46f)[0x5555556372ff]
rr(main+0x166)[0x5555555a21f6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7fffff06f1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7fffff06f28b]
rr(_start+0x25)[0x5555555a3975]
=== End rr backtrace

I think this is a kernel bug. The installation of the rust-docs component uses a lot of threads banging away at the filesystem laying down a lot of small files. If I had to guess, there's a data race somewhere in the whole Docker emulation setup that is corrupting the rustup process.

I suggest you file a bug with Docker.

jaraco commented 1 month ago

Thanks for the analysis. I've reported the issue upstream and also learned that downgrading to Docker 4.28 works around the issue as does disabling Rosetta or reducing the cores to 1. At least one user with an M3 Max chip is unable to replicate the issue.

jaraco commented 1 month ago

The issue is rather elusive. It's intermittent and difficult to replicate. Based on the analysis above, do you have any suggestions on a more minimal reproducer that might trigger faster and possibly more reliably?

lmouhib commented 1 week ago

I have the same issue, using Amazon Linux 2023, the build pass without passing the platform argument, when I pass --platform linux/amd64 it fails.

I am building on macos with m1 pro.

geraldstanje commented 5 days ago

@imbolc i had a similar problem on macbook m1 pro + docker desktop 4.28.0. i disabled Rosetta and it worked... any plans to get it work without disabling Rosetta?

matthiasg commented 5 days ago

Same issue here m1 max + docker 4.31.0

tracy-codes commented 3 days ago

Same issue here, m3 max + docker desktop 4.30.0