Simple "Hello World!" throws Segmentation fault on CentOS 6.8 x86

clytras commented 7 years ago

After installing rust, I tried the sample "Hello World!" program. It compiles just fine, but when I run it, it throws a segmentation fault after the "Hello World!" output.

This happens to a local server running CentOS 6.8 x86. This issue does not occur on a CentOS 6.8 x86_64. I am not sure if this is something that has to do with my OS installation or it's related to rust.

[root@deve tmp]# curl -sSf https://static.rust-lang.org/rustup.sh | sh
rustup: gpg available. signatures will be verified
rustup: downloading manifest for 'stable'
rustup: downloading toolchain for 'stable'
######################################################################## 100.0%
gpg: Signature made Tue 08 Nov 2016 07:30:26 PM EET using RSA key ID 7B3B09DC
gpg: Good signature from "Rust Language (Tag and Release Signing Key) <rust-key@rust-lang.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 108F 6620 5EAE B0AA A8DD  5E1C 85AB 96E6 FA1B E5FE
     Subkey fingerprint: C134 66B7 E169 A085 1886  3216 5CB4 A934 7B3B 09DC
rustup: installing toolchain for 'stable'
rustup: extracting installer
install: uninstalling component 'rustc'
install: uninstalling component 'rust-std-i686-unknown-linux-gnu'
install: uninstalling component 'rust-docs'
install: uninstalling component 'cargo'
install: creating uninstall script at /usr/local/lib/rustlib/uninstall.sh
install: installing component 'rustc'
install: installing component 'rust-std-i686-unknown-linux-gnu'
install: installing component 'rust-docs'
install: installing component 'cargo'

    Rust is ready to roll.

[root@deve tmp]# cd ../src/rust/
[root@deve rust]# echo 'fn main() { println!("Hello World!"); }' >> hello.rs
[root@deve rust]# rustc hello.rs
[root@deve rust]# ./hello
Hello World!
Segmentation fault (core dumped)
[root@deve rust]#

[root@deve rust]# cat /etc/redhat-release
CentOS release 6.8 (Final)

[root@deve rust]# uname -a
Linux deve.mashine.xxx 2.6.32-642.4.2.el6.i686 #1 SMP Tue Aug 23 19:20:20 UTC 2016 i686 i686 i386 GNU/Linux

[root@deve rust]# rustc -g hello.rs
[root@deve rust]# gdb hello
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-90.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/src/rust/hello...done.
warning: Missing auto-load scripts referenced in section .debug_gdb_scripts
of file /root/src/rust/hello
Use `info auto-load python [REGEXP]' to list them.
(gdb) r
Starting program: /root/src/rust/hello
[Thread debugging using libthread_db enabled]
Hello World!

Program received signal SIGSEGV, Segmentation fault.
0x78eeab28 in ?? ()
(gdb) bt full
#0  0x78eeab28 in ?? ()
No symbol table info available.
#1  0xb7e25e9f in __run_exit_handlers (status=0) at exit.c:78
        atfct = <value optimized out>
        onfct = <value optimized out>
        cxafct = <value optimized out>
        f = <value optimized out>
#2  exit (status=0) at exit.c:100
No locals.
#3  0xb7e0ed2e in __libc_start_main (main=0x114030 <main>, argc=1, ubp_av=0xbffff714, init=0x156fc0 <__libc_csu_init>, fini=0x156fb0 <__libc_csu_fini>, rtld_fini=0xb7fef590 <_dl_fini>, stack_end=0xbffff70c) at libc-start.c:258
        result = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-1208434700, 0, 0, -1073744152, -548302623, -513497359}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x1, 0x113e20}, data = {prev = 0x0, cleanup = 0x0, canceltype = 1}}}
        not_first_call = <value optimized out>
warning: (Internal error: pc 0x113e50 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x113e50 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x113e50 in read in psymtab, but not in symtab.)

#4  0x00113e51 in _start (warning: (Internal error: pc 0x113e50 in read in psymtab, but not in symtab.)

)
warning: (Internal error: pc 0x113e50 in read in psymtab, but not in symtab.)

No symbol table info available.
warning: (Internal error: pc 0x113e50 in read in psymtab, but not in symtab.)

(gdb)

alexcrichton commented 7 years ago

I was able to confirm this in a clean centos:6.8 docker image so I doubt it's something specific to your system. This segfault also went all the way back to Rust 1.0, so I don't think it's a regression anywhere along the way. I was unfortunately unable to get any good error messages out of it or pinpoint a cause, but TLS destructors seemed definitely related somehow to what's going on here.

itkovian commented 7 years ago

I am seeing this for 1.15.1 as well as for 1.16 (beta) on SL 6.8.

This is the only function I see being registered for running at exit:

(gdb) c
Continuing.

Breakpoint 3, __cxa_atexit (func=0xb7fef590 <_dl_fini>, arg=0x0, d=0x0) at cxa_atexit.c:58
58        return __internal_atexit (func, arg, d, &__exit_funcs);
(gdb) p func
$16 = (void (*)(void *)) 0xb7fef590 <_dl_fini>
(gdb) p arg
$17 = (void *) 0x0
(gdb) p d
$18 = (void *) 0x0
(gdb) p __exit_funcs
$19 = (struct exit_function_list *) 0xb7f8b160

But when I continue, the SIGSEGV seesms to occur elsewhere.

(gdb) c
Continuing.
Hello, world!

Breakpoint 1, exit (status=0) at exit.c:99
99      {
(gdb) list
94      }
95
96
97      void
98      exit (int status)
99      {
100       __run_exit_handlers (status, &__exit_funcs, true);
101     }
102     libc_hidden_def (exit)
(gdb) step
100       __run_exit_handlers (status, &__exit_funcs, true);
(gdb) p __exit_funcs
$20 = (struct exit_function_list *) 0xb7f8b160
(gdb) p (struct exit_function_list) *__exit_funcs
$21 = exit_function_list = {
  next = 0x0,
  idx = 1,
  fns = {exit_function = {flavor = 4, func = {at = 0x8bf6ba11, on = Traceback (most recent call last):
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/gdb_rust_pretty_printing.py", line 100, in rust_pretty_printer_lookup_function
    type_kind = val.type.get_type_kind()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 122, in get_type_kind
    self.__type_kind = self.__classify_struct()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 143, in __classify_struct
    if (unqualified_type_name.startswith(("&[", "&mut [")) and
AttributeError: 'NoneType' object has no attribute 'startswith'
{fn = 0x8bf6ba11, arg = 0x0}, cxa = Traceback (most recent call last):
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/gdb_rust_pretty_printing.py", line 100, in rust_pretty_printer_lookup_function
    type_kind = val.type.get_type_kind()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 122, in get_type_kind
    self.__type_kind = self.__classify_struct()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 143, in __classify_struct
    if (unqualified_type_name.startswith(("&[", "&mut [")) and
AttributeError: 'NoneType' object has no attribute 'startswith'
{fn = 0x8bf6ba11, arg = 0x0, dso_handle = 0x0}}}, exit_function = {flavor = 0, func = {at = 0, on = Traceback (most recent call last):
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/gdb_rust_pretty_printing.py", line 100, in rust_pretty_printer_lookup_function
    type_kind = val.type.get_type_kind()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 122, in get_type_kind
    self.__type_kind = self.__classify_struct()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 143, in __classify_struct
    if (unqualified_type_name.startswith(("&[", "&mut [")) and
AttributeError: 'NoneType' object has no attribute 'startswith'
{fn = 0, arg = 0x0}, cxa = Traceback (most recent call last):
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/gdb_rust_pretty_printing.py", line 100, in rust_pretty_printer_lookup_function
    type_kind = val.type.get_type_kind()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 122, in get_type_kind
    self.__type_kind = self.__classify_struct()
  File "/usr/local/vpw/rust/rust-beta/lib/rustlib/etc/debugger_pretty_printers_common.py", line 143, in __classify_struct
    if (unqualified_type_name.startswith(("&[", "&mut [")) and
AttributeError: 'NoneType' object has no attribute 'startswith'
{fn = 0, arg = 0x0, dso_handle = 0x0}}} <repeats 31 times>}
}
(gdb) step
__run_exit_handlers (status=0) at exit.c:42
42        while (*listp != NULL)

<snip>
(gdb) list
73                  case ef_cxa:
74                    cxafct = f->func.cxa.fn;
75      #ifdef PTR_DEMANGLE
76                    PTR_DEMANGLE (cxafct);
77      #endif
78                    cxafct (f->func.cxa.arg, status);
79                    break;
80                  }
81              }
(gdb) p f->func.cxa.fn
$27 = (void (*)(void *, int)) 0x8bf6ba11
(gdb) p f->func.cxa.arg
$28 = (void *) 0x0
(gdb) step
78                    cxafct (f->func.cxa.arg, status);
(gdb)

Program received signal SIGSEGV, Segmentation fault.
0x08c5fb5d in ?? ()

itkovian commented 7 years ago

Looking a bit further, I get here

   0xb7e25e80 <+192>:   shl    $0x4,%eax
   0xb7e25e83 <+195>:   lea    (%esi,%eax,1),%eax
   0xb7e25e86 <+198>:   mov    0xc(%eax),%edx
   0xb7e25e89 <+201>:   mov    %edi,0x4(%esp)
   0xb7e25e8d <+205>:   mov    0x10(%eax),%eax
   0xb7e25e90 <+208>:   ror    $0x9,%edx
   0xb7e25e93 <+211>:   xor    %gs:0x18,%edx
   0xb7e25e9a <+218>:   mov    %eax,(%esp)
=> 0xb7e25e9d <+221>:   call   *%edx
   0xb7e25e9f <+223>:   jmp    0xb7e25de8 <exit+40>
   0xb7e25ea4 <+228>:   nopl   0x0(%eax)
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) info registers
eax            0x0      0
ecx            0xb7f8c168       -1208434328
edx            0x2f01c865       788645989
ebx            0xb7f8aff4       -1208438796
esp            0xbffff600       0xbffff600
ebp            0xbffff638       0xbffff638
esi            0xb7f8c160       -1208434336
edi            0x0      0
eip            0xb7e25e9d       0xb7e25e9d <exit+221>
eflags         0x206    [ PF IF ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51
(gdb) x 0x2f01c865
0x2f01c865:     Cannot access memory at address 0x2f01c865

The demangled edx value on no run becomes the original value again.

itkovian commented 7 years ago

The original value gets mangled:

=> 0xb7e260f9 <+41>:    mov    0x8(%ebp),%edx
   0xb7e260fc <+44>:    xor    %gs:0x18,%edx
   0xb7e26103 <+51>:    rol    $0x9,%ed

Turning

(gdb) p func
$24 = (void (*)(void *)) 0xb7fef590 <_dl_fini>

into

(gdb) p func
$25 = (void (*)(void *)) 0x36834987

At the end, we find the same value again:

gdb) step
46            while (cur->idx > 0)
(gdb)
78                    cxafct (f->func.cxa.arg, status);
(gdb)
46            while (cur->idx > 0)
(gdb)
50                switch (f->flavor)
(gdb)
49                  &cur->fns[--cur->idx];
(gdb)
50                switch (f->flavor)
(gdb)
74                    cxafct = f->func.cxa.fn;
(gdb) p f->func.cxa.fn
$28 = (void (*)(void *, int)) 0x36834987

Moving through the mangling:

=> 0xb7e25e80 <+192>:   shl    $0x4,%eax
   0xb7e25e83 <+195>:   lea    (%esi,%eax,1),%eax
   0xb7e25e86 <+198>:   mov    0xc(%eax),%edx
   0xb7e25e89 <+201>:   mov    %edi,0x4(%esp)
   0xb7e25e8d <+205>:   mov    0x10(%eax),%eax
   0xb7e25e90 <+208>:   ror    $0x9,%edx
   0xb7e25e93 <+211>:   xor    %gs:0x18,%edx
   0xb7e25e9a <+218>:   mov    %eax,(%esp)
   0xb7e25e9d <+221>:   call   *%edx

And then we get ...

=> 0xb7e25e9a <+218>:   mov    %eax,(%esp)
   0xb7e25e9d <+221>:   call   *%edx
   0xb7e25e9f <+223>:   jmp    0xb7e25de8 <exit+40>
   0xb7e25ea4 <+228>:   nopl   0x0(%eax)
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) info registers
eax            0x0      0
ecx            0xb7f8c168       -1208434328
edx            0xc39b41a4       -1013235292
ebx            0xb7f8aff4       -1208438796
esp            0xbffff600       0xbffff600
ebp            0xbffff638       0xbffff638
esi            0xb7f8c160       -1208434336
edi            0x0      0
eip            0xb7e25e9a       0xb7e25e9a <exit+218>
eflags         0x282    [ SF IF ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

So, we now looking to call the function at 0xc39b41a4, which is totally not what was the original target before mangling.

So something messed with the guard value?

itkovian commented 7 years ago

If it's any help, a program with an empty main gives this:

# RUST_BACKTRACE=1 ./test-empty
thread '<unnamed>' panicked at 'cannot access a TLS value during or after it is destroyed', /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libcore/option.rs:715
stack backtrace:
   1:   0xfa37f5 - std::sys::imp::backtrace::tracing::imp::write::h23bcdb89e70c5bbf
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
   2:   0xfa54dd - std::panicking::default_hook::{{closure}}::he7b82439fd2d2bb6
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/panicking.rs:351
   3:   0xfa5089 - std::panicking::default_hook::he1cd4269c1558f23
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/panicking.rs:367
   4:   0xfa58d9 - std::panicking::rust_panic_with_hook::h006b37e36b7c8982
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/panicking.rs:555
   5:   0xfa572c - std::panicking::begin_panic::h043cddfdd3933cc4
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/panicking.rs:517
   6:   0xfa56a3 - std::panicking::begin_panic_fmt::h34e588bba6b8a2c2
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/panicking.rs:501
   7:   0xfa5616 - rust_begin_unwind
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/panicking.rs:477
   8:   0xfcd15f - core::panicking::panic_fmt::he52644573ecd78ff
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libcore/panicking.rs:69
   9:   0xfcd204 - core::option::expect_failed::h64a81ddcb7418e4e
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libcore/option.rs:715
  10:   0xfa33a0 - std::sys_common::thread_info::set::h80f3df40e0ba4361
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libcore/option.rs:293
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/thread/local.rs:251
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/sys_common/thread_info.rs:51
  11:   0xfa5c5e - std::rt::lang_start::h1ef940195e3c010e
                at /buildslave/rust-buildbot/slave/beta-dist-rustc-linux/build/src/libstd/rt.rs:51
  12:   0xf9efa0 - main
  13:   0x1d1d25 - __libc_start_main
  14:   0xf9ee80 - <unknown>
fatal runtime error: failed to initiate panic, error 5
Aborted

itkovian commented 7 years ago

I do notice the issue does not occur when creating a binary for i686-unknown-linux-gnu on an x86_64-unknown-linux-gnu platform using the --target option.

alexcrichton commented 7 years ago

Thanks for the investigation @itkovian!

I wonder if there's a stray out write causing this problem? Or maybe it's a runtime bug we're running into?

cuviper commented 7 years ago

I just discovered this issue myself, ouch! I wasn't even doing hello, just fn main(){}.

I found that the binary compiled on RHEL6 won't run on Fedora either, so it's not just runtime libraries. And the binary compiled on Fedora can't run on RHEL6 because it has a versioned GLIBC_2.18 symbol. But that symbol seems relevant to what @itkovian was looking at:

Fedora 24 i686:

# echo 'fn main(){}' | rustc - -o true
# nm ./true | grep atexit
00034820 t atexit
         U __cxa_atexit@@GLIBC_2.1.3
         w __cxa_thread_atexit_impl@@GLIBC_2.18
00016ac0 t stats_print_atexit
# ./true
# echo $?
0

RHEL6 i686:

# echo 'fn main(){}' | rustc - -o true
# nm ./true | grep atexit
         U __cxa_atexit@@GLIBC_2.1.3
         w __cxa_thread_atexit_impl
00034f20 t atexit
000171c0 t stats_print_atexit
# ./true
thread '<unnamed>' panicked at 'cannot access a TLS value during or after it is destroyed', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/option.rs:715
note: Run with `RUST_BACKTRACE=1` for a backtrace.
fatal runtime error: failed to initiate panic, error 5
Aborted

cuviper commented 7 years ago

I do notice the issue does not occur when creating a binary for i686-unknown-linux-gnu on an x86_64-unknown-linux-gnu platform using the --target option.

Was your x86_64 host also EL6? I do still see the issue when cross-compiling this way. But the native x86_64 target is fine.

itkovian commented 7 years ago

Ah! I forgot to check that @cuviper. It's CO 7.3, so the answer would be no.

cuviper commented 7 years ago

Actually, that's interesting. EL7 has glibc-2.17, so it's result doesn't have that GLIBC_2.18 dependency, and in fact that i686 binary runs just fine on EL6 i686! Hmm, now I want to try EL5 too, and I'll see if I can get our other toolchain folks to help me look at this.

cuviper commented 7 years ago

now I want to try EL5 too

This works fine!

cuviper commented 7 years ago

I believe it is this binutils bug: https://sourceware.org/bugzilla/show_bug.cgi?id=12654

alexcrichton commented 7 years ago

@cuviper wow that must have been quite the investigation! Does that mean if you re-link the binary without -pie it works?

If that's the cause here then I'm not sure what we can do about it :(

cuviper commented 7 years ago

Yeah, it was fun. Kudos to @fweimer for identifying the actual fix. Here's the RHEL6 bug: https://bugzilla.redhat.com/show_bug.cgi?id=1427285

So yeah, linking without -pie works, but I'm not sure of a clean way to get rustc to do that. I used -Csave-temps -Zprint-link-args and then removed the -pie manually. I tried -Crelocation-model=static, but that doesn't seem to override the target options here, which may be a bug in itself.

There's probably some way to get rustc to use your own fixed ld too, until the system ld can be fixed.

alexcrichton commented 7 years ago

Ok in that case I think I'll go ahead and close. Thanks though for the investigation @cuviper!

cuviper commented 7 years ago

FYI, the fix is in binutils-2.20.51.0.2-5.47.el6_9.1 (errata), and it looks like CentOS has shipped it too.

alexcrichton commented 7 years ago

Thanks for the update @cuviper!

rust-lang / rust

Simple "Hello World!" throws Segmentation fault on CentOS 6.8 x86 #37874