zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.49k stars 6.42k forks source link

native_sim: combining `-fstack-protector` with `CONFIG_POSIX_C_LIB_EXT=y` crashes during boot #74601

Closed mniestroj closed 2 months ago

mniestroj commented 2 months ago

Describe the bug Combining CONFIG_POSIX_C_LIB_EXT=y and -fstack-protector results in native_sim platform to crash during boot. For distributions that have -fstack-protectoras default (e.g. Arch Linux) it results in many integration tests and sample apps in Zephyr to fail (e.g.west twister` does not pass).

To Reproduce Steps to reproduce the behavior:

  1. west build -p -b native_sim zephyr/tests/posix/getentropy/ -T portability.posix.getentropy.picolibc -- -DEXTRA_CPPFLAGS=-fstack-protector
    • -fstack-protector is disabled by default on Ubuntu / https://hub.docker.com/r/zephyrprojectrtos/ci CI image
    • -fstack-protector is enabled by default on Arch Linux, so passing -DEXTRA_CPPFLAGS=-fstack-protector is not needed to reproduce this issue
  2. west build -t run
  3. Segmentation fault (core dumped) error occurs due to NULL pointer dereference
    • __stack_chk_init is called automatically by host libc, before even Zephyr init kicks in (since it is marked as __attribute__((__constructor__))
    • getentropy() is called, which is provided by Zephyr in case CONFIG_POSIX_C_LIB_EXT=y
    • device_is_ready(entropy) == false, so errno = EIO assignment is attempted
    • errno = ... is actually _kernel.cpus[0].current.errno_var = ... in that case; this results with segmentation fault since _kernel.cpus[0].current == 0x0 at this stage

Expected behavior Running west build -p -b native_sim zephyr/tests/posix/getentropy/ -T portability.posix.getentropy.picolibc -t run should pass, even on systems where -fstack-protector is the default (like on Arch Linux).

Impact Applications that use CONFIG_POSIX_C_LIB_EXT=y (e.g. using thread-safe getopt() shell extension, crc shell command, fnmatch() function) fail to work with native_sim. Also "standard" Zephyr integration tests fail on Linux distributions with -fstack-protector being in the default host compiler flags.

Logs and console output

GDB session logs ``` Breakpoint 1, __stack_chk_init () at /modules/lib/picolibc/newlib/libc/ssp/stack_protector.c:23 23 if (__stack_chk_guard != 0) (gdb) bt #0 __stack_chk_init () at /modules/lib/picolibc/newlib/libc/ssp/stack_protector.c:23 #1 0xf7c52bc7 in call_init (argc=1, argv=0xffffa044, env=) at ../csu/libc-start.c:145 #2 __libc_start_main_impl (main=0x804936d <_start+45>, argc=1, argv=0xffffa044, init=0x0, fini=0x0, rtld_fini=0xf7fcdfb0 <_dl_fini>, stack_end=0xffffa03c) at ../csu/libc-start.c:347 #3 0x08049368 in _start () (gdb) f 1 #1 0xf7c52bc7 in call_init (argc=1, argv=0xffffa044, env=) at ../csu/libc-start.c:145 Downloading source file /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c 145 ((dl_init_t) addrs[j]) (argc, argv, env); (gdb) l 140 { 141 unsigned int jm 142 = l->l_info[DT_INIT_ARRAYSZ]->d_un.d_val / sizeof (ElfW(Addr)); 143 ElfW(Addr) *addrs = (void *) (init_array->d_un.d_ptr + l->l_addr); 144 for (unsigned int j = 0; j < jm; ++j) 145 ((dl_init_t) addrs[j]) (argc, argv, env); 146 } 147 } 148 149 #else /* !SHARED */ (gdb) f 1 #1 0xf7c52bc7 in call_init (argc=1, argv=0xffffa044, env=) at ../csu/libc-start.c:145 145 ((dl_init_t) addrs[j]) (argc, argv, env); (gdb) f 0 #0 __stack_chk_init () at /modules/lib/picolibc/newlib/libc/ssp/stack_protector.c:23 23 if (__stack_chk_guard != 0) (gdb) s 26 if (getentropy) { (gdb) 28 getentropy(&__stack_chk_guard, sizeof(__stack_chk_guard)); (gdb) 0x0804b1db in getentropy (buffer=0x805ad08 <__stack_chk_guard>, length=4) at /zephyr/lib/posix/options/getentropy.c:16 16 { (gdb) 24 if (length > 256) { (gdb) 29 if (entropy == NULL || !device_is_ready(entropy)) { (gdb) p length $1 = 4 (gdb) s device_is_ready (dev=) at /home/macius/golioth/b/zephyr/include/generated/zephyr/syscalls/device.h:58 58 return z_impl_device_is_ready(dev); (gdb) 0x0804c42a in z_impl_device_is_ready (dev=0x80569c4 <__device_dts_ord_15>) at /zephyr/kernel/device.c:143 143 { (gdb) 152 return dev->state->initialized && (dev->state->init_res == 0U); (gdb) p dev->state $2 = (struct device_state *) 0x805a44c <__devstate_dts_ord_15> (gdb) p dev->state->init init_res initialized (gdb) p dev->state->initialized $3 = false (gdb) s getentropy (buffer=0x805ad08 <__stack_chk_guard>, length=4) at /zephyr/lib/posix/options/getentropy.c:25 25 errno = EIO; (gdb) z_errno_wrap () at /zephyr/lib/libc/picolibc/libc-hooks.c:250 250 return z_errno(); (gdb) p _current No symbol "_current" in current context. (gdb) p _kernel $4 = {cpus = {{nested = 0, irq_stack = 0x0, current = 0x0, idle_thread = 0x0, id = 0 '\000', arch = {}}}, ready_q = {cache = 0x0, runq = {{head = 0x0, next = 0x0}, {tail = 0x0, prev = 0x0}}}} (gdb) p _kernel.cpus $5 = {{nested = 0, irq_stack = 0x0, current = 0x0, idle_thread = 0x0, id = 0 '\000', arch = {}}} (gdb) p _kernel.cpus[0] $6 = {nested = 0, irq_stack = 0x0, current = 0x0, idle_thread = 0x0, id = 0 '\000', arch = {}} (gdb) p _kernel.cpus[0].current $7 = (struct k_thread *) 0x0 (gdb) ```
`gcc -v` (Zephyr CI Docker v0.26.13) ``` Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) ```
`gcc -v` (Arch Linux) ``` Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/14.1.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.1.1 20240522 (GCC) ```

Environment (please complete the following information):

Additional context Noticed that issue with downstream applications recently, after bumping to 3.7.0-rc1. git bisect showed 2bc722a97ed6 ("posix: deprecate POSIX_FNMATCH GETOPT GETENTROPY") as the first failing commit. The reason was that CONFIG_POSIX_C_LIB_EXT is now enabled for more applications than CONFIG_GETENTROPY was. As a result downstream app with getopt() use is now failing to boot with native_sim, due to getentropy() being implemented by Zephyr POSIX subsystem (which wasn't the case before, since CONFIG_GETENTROPY=n before Zephyr version bump).

mniestroj commented 2 months ago

FYI @keith-packard @cfriedt @aescolar

cfriedt commented 2 months ago

It might be possible to fix the issue in native_sim by setting the default cpu to a "dummy" cpu at build time to avoid the null pointer dereference.

Would that solve the issue for you @mniestroj ?

mniestroj commented 2 months ago

If you mean:

modified   kernel/init.c
@@ -44,7 +44,9 @@ BUILD_ASSERT(CONFIG_MP_NUM_CPUS == CONFIG_MP_MAX_NUM_CPUS,

 /* the only struct z_kernel instance */
 __pinned_bss
-struct z_kernel _kernel;
+struct z_kernel _kernel = {
+   .cpus[0].current = &_thread_dummy,
+};

 #ifdef CONFIG_PM
 __pinned_bss atomic_t _cpus_active;

then yes, it helps to fix/workaround the crash.

aescolar commented 2 months ago

@mniestroj I pressume that if you pass -DEXTRA_CPPFLAGS=-fno-stack-protector in your Arch build the issue goes away, correct? (I don't have Arch here to test)

mniestroj commented 2 months ago

@mniestroj I pressume that if you pass -DEXTRA_CPPFLAGS=-fno-stack-protector in your Arch build the issue goes away, correct? (I don't have Arch here to test)

@aescolar Yes, that is right.

aescolar commented 2 months ago

@aescolar Yes, that is right.

Thanks. The issue would seem to be that we are getting the picolibC/embedded stack protector code being used early in the boot process. Which I very much thing we should not. A quick "fix" may be to just say that we do not support enabling the stack-protector feature when building with picolibc for the posix arch, which I doubt many will miss. That could mean an if in the posix arch cmake file to set -fno-stack-protector if the compiler supports it. Anyhow, I'm looking into this. Thanks for the very detailed report :)