Open sgallagher opened 1 year ago
Correction, I have narrowed it down to -fno-ipa-icf
being sufficient at -O2 and -O3 to avoid the segfault. -fno-strict-aliasing
was a mistake (flawed test).
Thanks for the report, Stephen.
It looks like a V8 issue and V8 isn't -fstrict-aliasing
clean. I can't really think of a reason why -fipa-icf
(identical code folding) would trigger a crash though. Could this be a gcc bug?
If you want, you can send a pull request that adds -fno-ipa-icf
to cflags in configure.py. The logic would look something like this:
if gcc_version_ge((13,1,1)):
o['cflags'] += ['-fno-ipa-icf']
I can't reproduce on Debian with gcc-13.1.0:
$ gcc-13 --version
gcc-13 (Debian 13.1.0-3) 13.1.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ CC=gcc-13 CXX=g++-13 ./configure --ninja --shared
$ make node
$ ldd out/Release/node
linux-vdso.so.1 (0x00007ffcca13b000)
libnode.so.115 => /node/out/Release/lib/libnode.so.115 (0x00007fd1d7a00000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd1d781f000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd1d75c9000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd1d74ea000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd1dd850000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd1dd882000)
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.36-9) stable release version 2.36.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 12.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
$ CC=gcc-13 CXX=g++-13 make jstest
# mostly passes
Also can't reproduce with Arch and gcc-13.1.1:
$ gcc --version
gcc (GCC) 13.1.1 20230429
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./configure --ninja --shared
$ make node
$ ldd out/Release/node
linux-vdso.so.1 (0x00007ffc9a5a6000)
libnode.so.115 => /node/out/Release/lib/libnode.so.115 (0x00007fef7ce00000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fef7cb85000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007fef82d84000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fef7cb60000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fef7c976000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fef82e81000)
$ /usr/lib/libc.so.6
GNU C Library (GNU libc) stable release version 2.37.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 13.1.1 20230429.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 4.4.0
For bug reporting instructions, please see:
<https://bugs.archlinux.org/>.
$ uname -a
Linux 2a18fc7dfbd4 5.19.0-1024-aws #25~22.04.1-Ubuntu SMP Tue Apr 18 23:41:58 UTC 2023 x86_64 GNU/Linux
$ make jstest
# mostly passes
Whatever is failing here appears to be happening after several seconds in a non-main thread. My suspicion is that the tests aren't active long enough for whatever that other thread is to hit the bug. I can reproduce this after several seconds of running in the node shell environment (running node
without arguments).
I tried to run ./node
compiled on Fedora 38 with default configure options and it didn't fail.
I just rebuilt the Fedora RPM of 20.2.0 with -fipa-icf
and reproduced the error again:
(gdb) bt full
#0 0x00007f32dfc9f545 in v8::internal::compiler::BasicBlock::SuccessorAt () at ../../deps/v8/src/compiler/schedule.h:85
No locals.
#1 v8::internal::compiler::SpecialRPONumberer::ComputeLoopInfo () at ../../deps/v8/src/compiler/scheduler.cc:990
No locals.
#2 v8::internal::compiler::SpecialRPONumberer::ComputeAndInsertSpecialRPO (this=0x7f32d001a218, entry=0x7f32d0035618, end=0x7f3200000000) at ../../deps/v8/src/compiler/scheduler.cc:832
loop = 0x7f32d001aa60
order = <optimized out>
stack_depth = <optimized out>
num_loops = <optimized out>
current_loop = 0x0
current_header = 0x1
loop_depth = -805196912
#3 0x00007f32dfcb4ef3 in v8::internal::compiler::Scheduler::ComputeSchedule (zone=0x55d624bef270, graph=0x55d624c1a688, flags=..., tick_counter=0x7f32d001ab90, profile_data=0x7f32d0037280) at ../../deps/v8/src/compiler/scheduler.cc:65
schedule_zone = <optimized out>
node_count_hint = <optimized out>
schedule = <optimized out>
scheduler = {zone_ = 0x55d624bef270, graph_ = 0x55d624c1a688, schedule_ = 0x7f32d00355a0, flags_ = {mask_ = 2}, scheduled_nodes_ = {zone_ = 0x55d624bef270, data_ = 0x7f32d0019e40, end_ = 0x7f32d001a1c0, capacity_ = 0x7f32d001a218}, schedule_root_nodes_ = {zone_ = 0x55d624bef270, data_ = 0x0, end_ = 0x0, capacity_ = 0x0}, schedule_queue_ = {<No data fields>}, node_data_ = {zone_ = 0x55d624bef270,
data_ = 0x7f32d0014ab8, end_ = 0x7f32d0016e68, capacity_ = 0x7f32d00171f8}, control_flow_builder_ = 0x7f32d0018408, special_rpo_ = 0x7f32d001a218, equivalence_ = 0x7f32d00171f8, tick_counter_ = 0x55d624bec048, profile_data_ = 0x0, common_dominator_cache_ = {<No data fields>}}
#4 0x00007f32dfc8d712 in v8::internal::compiler::ComputeSchedulePhase::Run () at ../../deps/v8/src/compiler/pipeline.cc:2342
No locals.
#5 v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::ComputeSchedulePhase> () at ../../deps/v8/src/compiler/pipeline.cc:1367
No locals.
#6 v8::internal::compiler::PipelineImpl::ComputeScheduledGraph (this=0x55d624bec070) at ../../deps/v8/src/compiler/pipeline.cc:3783
data = 0x55d624bec070
#7 0x00007f32dfc9405c in v8::internal::compiler::PipelineImpl::OptimizeGraph (this=0x55d624bec2f0, linkage=0x55d624c19568) at ../../deps/v8/src/compiler/pipeline.cc:3026
data = <optimized out>
#8 0x00007f32dfc9550f in v8::internal::compiler::PipelineCompilationJob::ExecuteJobImpl (this=0x55d624bebee0, stats=0x55d624bebfa8, local_isolate=0x55d624bec2f0) at ../../deps/v8/src/compiler/pipeline.cc:1299
scope = <optimized out>
local_isolate_scope = <optimized out>
#9 0x00007f32e0b9ffef in v8::internal::OptimizedCompilationJob::ExecuteJob () at ../../deps/v8/src/codegen/compiler.cc:496
No locals.
#10 0x00007f32e0bd3aa5 in v8::internal::OptimizingCompileDispatcher::CompileNext () at ../../deps/v8/src/compiler-dispatcher/optimizing-compile-dispatcher.cc:105
No locals.
#11 0x00007f32e0bd3e44 in v8::internal::OptimizingCompileDispatcher::CompileTask::RunInternal () at ../../deps/v8/src/compiler-dispatcher/optimizing-compile-dispatcher.cc:67
No locals.
#12 0x00007f32df5e7b4e in PlatformWorkerThread (data=0x55d624a6f200) at ../../src/node_platform.cc:43
task = <optimized out>
pending_worker_tasks = 0x55d624a6ea00
#13 0x00007f32de952907 in start_thread (arg=<optimized out>) at pthread_create.c:444
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139856459408912, 1118916026530479130, 139856408622784, -312, 2, 140733863298256, 1118916026543062042, 1118926824340536346}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#14 0x00007f32de9d8870 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I'm attaching the full build logs as well. build-20.2.0-2.fc38.log.gz
Whatever is failing here appears to be happening after several seconds in a non-main thread. My suspicion is that the tests aren't active long enough for whatever that other thread is to hit the bug. I can reproduce this after several seconds of running in the node shell environment (running
node
without arguments).
yeah, I tried that & still couldn't reproduce.
I tried fedora on docker and still couldn't reproduce:
$ cd /node
$ dnf install -y gcc-c++ make python2 git
$ git rev-parse HEAD
ad2c05b671aff71259afbf23de32b6f177c3ba61
$ cat /etc/fedora-release
Fedora release 38 (Thirty Eight)
$ gcc --version
gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ /lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.37.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 13.0.1 20230401 (Red Hat 13.0.1-0).
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<https://www.gnu.org/software/libc/bugs.html>.
$ ./configure && make -j`nproc`
$ ./node
# works fine, waited a couple of minutes, tried executing things, etc.
@sgallagher The full configure arguments including any CFLAGS etc exported in the environment are likely needed.
@sgallagher The full configure arguments including any CFLAGS etc exported in the environment are likely needed.
Which are all included in the logs I attached.
Sorry, I missed that!
Definitely can reproduce the issue,.
Relevant flags
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -pipe -fno-stack-protector -O3 -fuse-ld=mold"
CHOST="x86_64-pc-linux-gnu"
CXXFLAGS="-march=native -pipe -fno-stack-protector -O3 -fuse-ld=mold"
LDFLAGS="-fuse-ld=mold -Wl,-O1 -Wl,--as-needed"
Configure flags
GYP_DEFINES="linux_use_gold_flags=0 linux_use_bundled_binutils=0 linux_use_bundled_gold=0"./configure.py --dest-cpu=x64 --ninja --shared-brotli --shared-cares --shared-libuv --shared-nghttp2 --shared-zlib --enable-lto --with-intl=system-icu --without-inspector --without-npm --without-node-snapshot --shared-openssl --openssl-use-def-ca-store
I changed the linker from mold to lld and the problem is gone.
Version
20.1.0
Platform
Linux nodejs 6.2.14-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 1 00:55:28 UTC 2023 x86_64 GNU/Linux
Subsystem
v8
What steps will reproduce the bug?
--shared
for Fedora, hence why the backtrace below mentionslibnode.so.115
.node
.From GDB:
[*] I did some extensive narrowing down and was able to determine that
-O1 -fipa-icf -fstrict-aliasing
is what triggers it. Building with-O2 -fno-ipa-icf -fno-strict-aliasing
avoids the segfault.How often does it reproduce? Is there a required condition?
Reproduces every time.
--shared
for Fedora, hence why the backtrace below mentionslibnode.so.115
.[*] See above.
What is the expected behavior? Why is that the expected behavior?
It shouldn't crash.
What do you see instead?
Additional information
I'm not certain if the issue is due to an code-generation bug in GCC or a problem with Node.js.