rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.56k stars 12.74k forks source link

std::net::TcpStream::connect() and .to_socket_addrs() segfault when address is "localhost:8080", build is static and /etc/hosts is empty on arch linux. #100711

Open alkeryn opened 2 years ago

alkeryn commented 2 years ago

Both of those function will segfault when trying to resolve localhost on any port if the following condition are met :

I tried this code:

use std::net::ToSocketAddrs;

pub fn main() {
    println!("before");
    let _ = "localhost:8080".to_socket_addrs(); // will segfault
    std::net::TcpStream::connect("localhost:8080").unwrap(); // will also segfault
    println!("hello world");
}

I expected to see this happen: the address is resolved

Instead, this happened: the program segfault

rustc --version --verbose:

rustc 1.65.0-nightly (9c20b2a8c 2022-08-17)
binary: rustc
commit-hash: 9c20b2a8cc7588decb6de25ac6a7912dcef24d65
commit-date: 2022-08-17
host: x86_64-unknown-linux-gnu
release: 1.65.0-nightly
LLVM version: 15.0.0

uname -a: (This is Arch-linux lattest, i could not reproduce the bug on another distro, but still, it shouldn't segfault)

Linux Alkeryn-PC 5.19.1-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 11 Aug 2022 16:06:13 +0000 x86_64 GNU/Linux

For the backtrace, RUST_BACKTRACE=1 did not work and gave the following output :

RUST_BACKTRACE=1 ./main
before
zsh: segmentation fault (core dumped)  RUST_BACKTRACE=1 ./main

so here is a backtrace made with gdb (don't mind the gef plugin being installed

Backtrace

``` [ Legend: Modified register | Code | Heap | Stack | String ] ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── registers ──── $rax : 0x0 $rbx : 0x3 $rcx : 0x007ffff7cf838e → 0x310b77fffff0003d ("="?) $rdx : 0x007ffff7ff8490 → <_dl_static_dtv+16> add BYTE PTR [rax], al $rsp : 0x007fffffffcaf0 → "/proc/sys/net/ipv6/conf/all/disable_ipv6" $rbp : 0x007fffffffcc20 → 0x007fffffffcce0 → 0x0000000000000010 $rsi : 0x007ffff7d99dd5 → 0x6225206125000200 $rdi : 0x007ffff79c1c88 → 0x0000000000000005 $rip : 0x007ffff79a5196 → mov r12, QWORD PTR [rax+0x8] $r8 : 0x0 $r9 : 0x0 $r10 : 0x1000 $r11 : 0x206 $r12 : 0x0 $r13 : 0x007fffffffcaf0 → "/proc/sys/net/ipv6/conf/all/disable_ipv6" $r14 : 0x007fffffffccf0 → 0x0000000000000000 $r15 : 0x007fffffffcca0 → 0xe1efbb33d283a048 $eflags: [ZERO carry PARITY adjust sign trap INTERRUPT direction overflow RESUME virtualx86 identification] $cs: 0x33 $ss: 0x2b $ds: 0x00 $es: 0x00 $fs: 0x00 $gs: 0x00 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ──── 0x007fffffffcaf0│+0x0000: "/proc/sys/net/ipv6/conf/all/disable_ipv6" ← $rsp, $r13 0x007fffffffcaf8│+0x0008: "s/net/ipv6/conf/all/disable_ipv6" 0x007fffffffcb00│+0x0010: "v6/conf/all/disable_ipv6" 0x007fffffffcb08│+0x0018: "all/disable_ipv6" 0x007fffffffcb10│+0x0020: "ble_ipv6" 0x007fffffffcb18│+0x0028: 0xffffffffffffff00 0x007fffffffcb20│+0x0030: 0x0000000000000000 0x007fffffffcb28│+0x0038: 0x007ffff79a4e33 → lea rdx, [rax+0xb] ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ──── 0x7ffff79a5187 je 0x7ffff79a51d0 0x7ffff79a5189 lea rdi, [rip+0x1caf8] # 0x7ffff79c1c88 0x7ffff79a5190 call QWORD PTR [rip+0x1cc82] # 0x7ffff79c1e18 → 0x7ffff79a5196 mov r12, QWORD PTR [rax+0x8] 0x7ffff79a519d mov r13, rax 0x7ffff79a51a0 test r12, r12 0x7ffff79a51a3 je 0x7ffff79a51e5 0x7ffff79a51a5 sub r12, 0x1 0x7ffff79a51a9 mov eax, 0x3ffffe ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── threads ──── [#0] Id 1, Name: "main", stopped 0x7ffff79a5196 in ?? (), reason: SIGSEGV ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── trace ──── [#0] 0x7ffff79a5196 → mov r12, QWORD PTR [rax+0x8] [#1] 0x7ffff79ad6b1 → jmp 0x7ffff79ad518 [#2] 0x7ffff79a1045 → mov rbx, QWORD PTR [rsp] [#3] 0x7ffff79aa1a6 → _nss_myhostname_gethostbyname4_r() [#4] 0x7ffff7f248ae → getaddrinfo() [#5] 0x7ffff7ed5cf6 → std::sys_common::net::{impl#6}::try_from() [#6] 0x7ffff7ece64c → core::convert::{impl#6}::try_into<(&str, u16), std::sys_common::net::LookupHost>() [#7] 0x7ffff7ece64c → std::sys_common::net::{impl#5}::try_from() [#8] 0x7ffff7ece64c → core::convert::{impl#6}::try_into<&str, std::sys_common::net::LookupHost>() [#9] 0x7ffff7ece64c → std::net::addr::{impl#30}::to_socket_addrs() ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── gef➤ bt #0 0x00007ffff79a5196 in ?? () from /usr/lib/libnss_myhostname.so.2 #1 0x00007ffff79ad6b1 in ?? () from /usr/lib/libnss_myhostname.so.2 #2 0x00007ffff79a1045 in ?? () from /usr/lib/libnss_myhostname.so.2 #3 0x00007ffff79aa1a6 in _nss_myhostname_gethostbyname4_r () from /usr/lib/libnss_myhostname.so.2 #4 0x00007ffff7f248ae in getaddrinfo () #5 0x00007ffff7ed5cf6 in std::sys_common::net::{impl#6}::try_from () at library/std/src/sys_common/net.rs:205 #6 0x00007ffff7ece64c in core::convert::{impl#6}::try_into<(&str, u16), std::sys_common::net::LookupHost> () at library/core/src/convert/mod.rs:590 #7 std::sys_common::net::{impl#5}::try_from () at library/std/src/sys_common/net.rs:190 #8 core::convert::{impl#6}::try_into<&str, std::sys_common::net::LookupHost> () at library/core/src/convert/mod.rs:590 #9 std::net::addr::{impl#30}::to_socket_addrs () at library/std/src/net/addr.rs:961 #10 0x00007ffff7eb91eb in main::main () #11 0x00007ffff7eb9ef3 in core::ops::function::FnOnce::call_once () #12 0x00007ffff7eb9159 in std::sys_common::backtrace::__rust_begin_short_backtrace () #13 0x00007ffff7eb8fc9 in std::rt::lang_start::{{closure}} () #14 0x00007ffff7ecb7bf in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/core/src/ops/function.rs:280 #15 std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:492 #16 std::panicking::try + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:456 #17 std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:137 #18 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:128 #19 std::panicking::try::do_call () at library/std/src/panicking.rs:492 #20 std::panicking::try () at library/std/src/panicking.rs:456 #21 std::panic::catch_unwind () at library/std/src/panic.rs:137 #22 std::rt::lang_start_internal () at library/std/src/rt.rs:128 #23 0x00007ffff7eb8fb1 in std::rt::lang_start () #24 0x00007ffff7eb9273 in main () gef➤ ```

Urgau commented 2 years ago

I'm unable to reproduce the issue. I tried stable, beta, nightly, with/without /etc/hosts, ...

I'm also seeing thanks to your gdb backtrace that the crash seems to be inside /usr/lib/libnss_myhostname.so.2 which is a systemd library. This doesn't indicate that the rust code isn't responsible for the crash but it pass trough the glibc getaddrinfo function which should have rejected invalid inputs, so.

What is your current systemd-libs and glibc package version installed ? Is your system completely updated ?

alkeryn commented 2 years ago

Hey, yea, i was unable to reproduce it on my debian based vps.

on the system it does occur,

systemd-libs version is 251.4-1 glibc version is 2.36-2

yes, i updated it yesterday. isn't it odd that it tries to use that library knowing it is a static build however ?

alkeryn commented 2 years ago

@Urgau oh wait, i did put part of my original issue in a comment block, the condition were missing. you need to compile with rustc -C target-feature=+crt-static main.rs sorry, i missed that it was commented out.

Urgau commented 2 years ago

Okay, thanks for the info.

isn't it odd that it tries to use that library knowing it is a static build however ?

Well, yes but mostly no. Generally a static build include mostly/every library it is dynamically linking to but sometimes some libraries aren't linked trough at linked time but figured out at run-time and here that's the case for the domain resolution, because there are many different ways it could be done and including all of them isn't possible.

@Urgau oh wait, i did put part of my original issue in a comment block, the condition were missing. you need to compile with rustc -C target-feature=+crt-static main.rs sorry, i missed that it was commented out.

Thanks I was about to ask.


I'm now able to reproduce the crash and I'm almost at 100% sure it's a glibc bug. Unfortunately glibc advise against static linking, so I'm not sure if reporting the crash to them will help.

I would however advise you to use musl a glibc replacement that is known to work with static linking and is supported natively by the Rust compiler. Just install the target rustup +nightly target install x86_64-unknown-linux-musl and build for the target rustc -C target-feature=+crt-static --target=x86_64-unknown-linux-musl main.rs

alkeryn commented 2 years ago

@Urgau thanks ! i do wonder why i can't reproduce it on a debian server, but not that important.

i see, still i wouldn't have expected to segfault a rust program without using unsafe, even though it segfault from glibc, couldn't rust handle it gracefully in one way or another ?

anyway, thanks for the tips !

Urgau commented 2 years ago

i do wonder why i can't reproduce it on a debian server, but not that important.

I also tested on a debian-based system and couldn't reproduced the crash. The problem probably comes from the recent glibc upgrade done in archlinux. This may be a recent regression in glibc, but as I said glibc advise against static-linking so I don't know if they will do something about it.

i see, still i wouldn't have expected to segfault a rust program without using unsafe, even though it segfault from glibc, couldn't rust handle it gracefully in one way or another ?

The segfault is not in the rust code it's in the systemd lib probably because glibc passed some invalid values (speculation). There nothing the rust runtime can do in this situation, we don't have control over glibc, systemd`, or whatever else.

SIGSEVG means invalid memory access, this generally means that some piece of code wanted to access a place in memory that it doesn't have the permission to do so. This could leave some state in an invalid state, corrupting other state and maybe even more. The only sensible things to do in this situation is to abort.

alkeryn commented 2 years ago

Well thank you for all the details ! :) should we close the issue or report it to glibc devs ?

pymongo commented 2 years ago

Reproduce on manjaro Linux ww 5.10.136-1-MANJARO with glibc 2.36, same backtrace

saethlin commented 2 months ago

It's probably worth reporting this upstream.

The segfault here is a null pointer dereference on this line: https://github.com/systemd/systemd/blob/b45730389ba025489ec8d445bc91534fef515c28/src/basic/memory-util.c#L12

I suspect that the problem is that thread-locals aren't initialized. Whether that's caused by our unsupported linkage, or it's some other kind of bug in rustc or glibc/systemd is unclear. But I'm a C novice, so that's not saying much.

Noratrieb commented 2 months ago

This is exactly why you should not link glibc statically. Your glibc dlopened the systemd library which probably depends on glibc too and thus brought in a second glibc. That is guaranteed to cause issues. image

You should either stop linking glibc statically or switch to a musl target, which supports static linking (and even does so by default today). I don't think upstream glibc would treat this as a bug.

Noratrieb commented 2 months ago

I think it would make sense to print a warning when trying to link glibc statically.