nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.71k stars 29.65k forks source link

SIGSEGV when compiled with -O2 or -O3 on gcc 13.1.1 #47865

Open sgallagher opened 1 year ago

sgallagher commented 1 year ago

Version

20.1.0

Platform

Linux nodejs 6.2.14-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 1 00:55:28 UTC 2023 x86_64 GNU/Linux

Subsystem

v8

What steps will reproduce the bug?

From GDB:

Thread 3 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2ffe6c0 (LWP 1565850)]
0x00007ffff57db195 in v8::internal::compiler::SpecialRPONumberer::ComputeAndInsertSpecialRPO(v8::internal::compiler::BasicBlock*, v8::internal::compiler::BasicBlock*) () from /lib64/libnode.so.115

[*] I did some extensive narrowing down and was able to determine that -O1 -fipa-icf -fstrict-aliasing is what triggers it. Building with -O2 -fno-ipa-icf -fno-strict-aliasing avoids the segfault.

How often does it reproduce? Is there a required condition?

Reproduces every time.

[*] See above.

What is the expected behavior? Why is that the expected behavior?

It shouldn't crash.

What do you see instead?

$ node-20
Welcome to Node.js v20.1.0.
Type ".help" for more information.
> req   [1]    571761 segmentation fault (core dumped)  node-20

Additional information

I'm not certain if the issue is due to an code-generation bug in GCC or a problem with Node.js.

sgallagher commented 1 year ago

Correction, I have narrowed it down to -fno-ipa-icf being sufficient at -O2 and -O3 to avoid the segfault. -fno-strict-aliasing was a mistake (flawed test).

bnoordhuis commented 1 year ago

Thanks for the report, Stephen.

It looks like a V8 issue and V8 isn't -fstrict-aliasing clean. I can't really think of a reason why -fipa-icf (identical code folding) would trigger a crash though. Could this be a gcc bug?

If you want, you can send a pull request that adds -fno-ipa-icf to cflags in configure.py. The logic would look something like this:

if gcc_version_ge((13,1,1)):
  o['cflags'] += ['-fno-ipa-icf']
kvakil commented 1 year ago

I can't reproduce on Debian with gcc-13.1.0:

$ gcc-13 --version
gcc-13 (Debian 13.1.0-3) 13.1.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ CC=gcc-13 CXX=g++-13 ./configure --ninja --shared
$ make node
$ ldd out/Release/node
        linux-vdso.so.1 (0x00007ffcca13b000)
        libnode.so.115 => /node/out/Release/lib/libnode.so.115 (0x00007fd1d7a00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd1d781f000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd1d75c9000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd1d74ea000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd1dd850000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fd1dd882000)
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.36-9) stable release version 2.36.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 12.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
$ CC=gcc-13 CXX=g++-13 make jstest
# mostly passes

Also can't reproduce with Arch and gcc-13.1.1:

$ gcc --version
gcc (GCC) 13.1.1 20230429
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./configure --ninja --shared
$ make node
$ ldd out/Release/node
        linux-vdso.so.1 (0x00007ffc9a5a6000)
        libnode.so.115 => /node/out/Release/lib/libnode.so.115 (0x00007fef7ce00000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fef7cb85000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007fef82d84000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fef7cb60000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fef7c976000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fef82e81000)
$ /usr/lib/libc.so.6
GNU C Library (GNU libc) stable release version 2.37.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 13.1.1 20230429.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 4.4.0
For bug reporting instructions, please see:
<https://bugs.archlinux.org/>.
$ uname -a
Linux 2a18fc7dfbd4 5.19.0-1024-aws #25~22.04.1-Ubuntu SMP Tue Apr 18 23:41:58 UTC 2023 x86_64 GNU/Linux
$ make jstest
# mostly passes
sgallagher commented 1 year ago

Whatever is failing here appears to be happening after several seconds in a non-main thread. My suspicion is that the tests aren't active long enough for whatever that other thread is to hit the bug. I can reproduce this after several seconds of running in the node shell environment (running node without arguments).

targos commented 1 year ago

I tried to run ./node compiled on Fedora 38 with default configure options and it didn't fail.

sgallagher commented 1 year ago

I just rebuilt the Fedora RPM of 20.2.0 with -fipa-icf and reproduced the error again:

(gdb) bt full
#0  0x00007f32dfc9f545 in v8::internal::compiler::BasicBlock::SuccessorAt () at ../../deps/v8/src/compiler/schedule.h:85
No locals.
#1  v8::internal::compiler::SpecialRPONumberer::ComputeLoopInfo () at ../../deps/v8/src/compiler/scheduler.cc:990
No locals.
#2  v8::internal::compiler::SpecialRPONumberer::ComputeAndInsertSpecialRPO (this=0x7f32d001a218, entry=0x7f32d0035618, end=0x7f3200000000) at ../../deps/v8/src/compiler/scheduler.cc:832
        loop = 0x7f32d001aa60
        order = <optimized out>
        stack_depth = <optimized out>
        num_loops = <optimized out>
        current_loop = 0x0
        current_header = 0x1
        loop_depth = -805196912
#3  0x00007f32dfcb4ef3 in v8::internal::compiler::Scheduler::ComputeSchedule (zone=0x55d624bef270, graph=0x55d624c1a688, flags=..., tick_counter=0x7f32d001ab90, profile_data=0x7f32d0037280) at ../../deps/v8/src/compiler/scheduler.cc:65
        schedule_zone = <optimized out>
        node_count_hint = <optimized out>
        schedule = <optimized out>
        scheduler = {zone_ = 0x55d624bef270, graph_ = 0x55d624c1a688, schedule_ = 0x7f32d00355a0, flags_ = {mask_ = 2}, scheduled_nodes_ = {zone_ = 0x55d624bef270, data_ = 0x7f32d0019e40, end_ = 0x7f32d001a1c0, capacity_ = 0x7f32d001a218}, schedule_root_nodes_ = {zone_ = 0x55d624bef270, data_ = 0x0, end_ = 0x0, capacity_ = 0x0}, schedule_queue_ = {<No data fields>}, node_data_ = {zone_ = 0x55d624bef270, 
            data_ = 0x7f32d0014ab8, end_ = 0x7f32d0016e68, capacity_ = 0x7f32d00171f8}, control_flow_builder_ = 0x7f32d0018408, special_rpo_ = 0x7f32d001a218, equivalence_ = 0x7f32d00171f8, tick_counter_ = 0x55d624bec048, profile_data_ = 0x0, common_dominator_cache_ = {<No data fields>}}
#4  0x00007f32dfc8d712 in v8::internal::compiler::ComputeSchedulePhase::Run () at ../../deps/v8/src/compiler/pipeline.cc:2342
No locals.
#5  v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::ComputeSchedulePhase> () at ../../deps/v8/src/compiler/pipeline.cc:1367
No locals.
#6  v8::internal::compiler::PipelineImpl::ComputeScheduledGraph (this=0x55d624bec070) at ../../deps/v8/src/compiler/pipeline.cc:3783
        data = 0x55d624bec070
#7  0x00007f32dfc9405c in v8::internal::compiler::PipelineImpl::OptimizeGraph (this=0x55d624bec2f0, linkage=0x55d624c19568) at ../../deps/v8/src/compiler/pipeline.cc:3026
        data = <optimized out>
#8  0x00007f32dfc9550f in v8::internal::compiler::PipelineCompilationJob::ExecuteJobImpl (this=0x55d624bebee0, stats=0x55d624bebfa8, local_isolate=0x55d624bec2f0) at ../../deps/v8/src/compiler/pipeline.cc:1299
        scope = <optimized out>
        local_isolate_scope = <optimized out>
#9  0x00007f32e0b9ffef in v8::internal::OptimizedCompilationJob::ExecuteJob () at ../../deps/v8/src/codegen/compiler.cc:496
No locals.
#10 0x00007f32e0bd3aa5 in v8::internal::OptimizingCompileDispatcher::CompileNext () at ../../deps/v8/src/compiler-dispatcher/optimizing-compile-dispatcher.cc:105
No locals.
#11 0x00007f32e0bd3e44 in v8::internal::OptimizingCompileDispatcher::CompileTask::RunInternal () at ../../deps/v8/src/compiler-dispatcher/optimizing-compile-dispatcher.cc:67
No locals.
#12 0x00007f32df5e7b4e in PlatformWorkerThread (data=0x55d624a6f200) at ../../src/node_platform.cc:43
        task = <optimized out>
        pending_worker_tasks = 0x55d624a6ea00
#13 0x00007f32de952907 in start_thread (arg=<optimized out>) at pthread_create.c:444
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139856459408912, 1118916026530479130, 139856408622784, -312, 2, 140733863298256, 1118916026543062042, 1118926824340536346}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#14 0x00007f32de9d8870 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I'm attaching the full build logs as well. build-20.2.0-2.fc38.log.gz

kvakil commented 1 year ago

Whatever is failing here appears to be happening after several seconds in a non-main thread. My suspicion is that the tests aren't active long enough for whatever that other thread is to hit the bug. I can reproduce this after several seconds of running in the node shell environment (running node without arguments).

yeah, I tried that & still couldn't reproduce.

I tried fedora on docker and still couldn't reproduce:

$ cd /node
$ dnf install -y gcc-c++ make python2 git
$ git rev-parse HEAD
ad2c05b671aff71259afbf23de32b6f177c3ba61
$ cat /etc/fedora-release
Fedora release 38 (Thirty Eight)
$ gcc --version
gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ /lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.37.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 13.0.1 20230401 (Red Hat 13.0.1-0).
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<https://www.gnu.org/software/libc/bugs.html>.
$ ./configure && make -j`nproc`
$ ./node
# works fine, waited a couple of minutes, tried executing things, etc.
thesamesam commented 1 year ago

@sgallagher The full configure arguments including any CFLAGS etc exported in the environment are likely needed.

sgallagher commented 1 year ago

@sgallagher The full configure arguments including any CFLAGS etc exported in the environment are likely needed.

Which are all included in the logs I attached.

thesamesam commented 1 year ago

Sorry, I missed that!

ambasta commented 1 year ago

Definitely can reproduce the issue,.

Relevant flags

CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -pipe -fno-stack-protector -O3 -fuse-ld=mold"
CHOST="x86_64-pc-linux-gnu"
CXXFLAGS="-march=native -pipe -fno-stack-protector -O3 -fuse-ld=mold"
LDFLAGS="-fuse-ld=mold -Wl,-O1 -Wl,--as-needed"

Configure flags

GYP_DEFINES="linux_use_gold_flags=0 linux_use_bundled_binutils=0 linux_use_bundled_gold=0"./configure.py  --dest-cpu=x64 --ninja --shared-brotli --shared-cares --shared-libuv --shared-nghttp2 --shared-zlib --enable-lto --with-intl=system-icu --without-inspector --without-npm --without-node-snapshot --shared-openssl --openssl-use-def-ca-store
tannal commented 2 months ago

I changed the linker from mold to lld and the problem is gone.