nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.67k stars 29.64k forks source link

npm --version segfaults on v7.7.3 linux ppc64 (BE) only ON RHEL 7 #11882

Closed mattcolegate closed 7 years ago

mattcolegate commented 7 years ago

Running node --version works on this platform and version, but running npm --version causes a segmentation fault and core dump.

cc/ @gibfahn who helped with initial diagnosis

gibfahn commented 7 years ago

To add some more info, the version downloaded was https://nodejs.org/dist/latest-v7.x/node-v7.7.3-linux-ppc64.tar.gz.

Running node -e 'console.log("hi")' also segfaults.

bnoordhuis commented 7 years ago

Running node -e 'console.log("hi")' also segfaults.

Can you obtain a stack trace?

sxa commented 7 years ago

Ouch - this from a RHEL70 box (EDIT: Also occurs on 7.1)

#0  0x00003fffb7aaa4e0 in .__memset_power7 () from /lib64/power8/libc.so.6
#1  0x0000000010f69094 in ._ZN2v88internal18RegExpResultsCache5ClearEPNS0_10FixedArrayE ()
#2  0x0000000010cace24 in ._ZN2v88internal4Heap19MarkCompactPrologueEv ()
#3  0x0000000010cbf964 in ._ZN2v88internal4Heap11MarkCompactEv ()
#4  0x0000000010cca010 in ._ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE ()
#5  0x0000000010cca4f8 in ._ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE ()
#6  0x0000000010ccce30 in ._ZN2v88internal4Heap12ReserveSpaceEPNS0_4ListINS1_5ChunkENS0_25FreeStoreAllocationPolicyEEEPNS2_IPhS4_EE ()
#7  0x000000001104a598 in ._ZN2v88internal12Deserializer11DeserializeEPNS0_7IsolateE ()
#8  0x0000000010dcea9c in ._ZN2v88internal7Isolate4InitEPNS0_12DeserializerE
    ()
#9  0x00000000110560b4 in ._ZN2v88internal8Snapshot10InitializeEPNS0_7IsolateE
    ()
#10 0x000000001075cca8 in ._ZN2v87Isolate3NewERKNS0_12CreateParamsE ()
#11 0x00000000111f6f04 in ._ZN4node5StartEP9uv_loop_siPKPKciS5_ ()
#12 0x00000000111f650c in ._ZN4node5StartEiPPc ()
#13 0x000000001052b720 in .main ()
bnoordhuis commented 7 years ago

@sxa555 Can you post the output of disassemble and info registers?

sxa commented 7 years ago

FYI I've built 7.7.3 from commit 9c68a69 locally and it doesn't fail in the same way. This is with this version of gcc (have the build boxes been upgraded recently to 4.9? I haven't used that yet)

gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16)

sxa commented 7 years ago

@bnoordhuis (For reference this output was just running "node" without any parameters)

(gdb) bt
#0  0x00003fffb7aaa4e0 in .__memset_power7 () from /lib64/power8/libc.so.6
#1  0x0000000010f69094 in ._ZN2v88internal18RegExpResultsCache5ClearEPNS0_10FixedArrayE ()
#2  0x0000000010cace24 in ._ZN2v88internal4Heap19MarkCompactPrologueEv ()
#3  0x0000000010cbf964 in ._ZN2v88internal4Heap11MarkCompactEv ()
#4  0x0000000010cca010 in ._ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE ()
#5  0x0000000010cca4f8 in ._ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE ()
#6  0x0000000010ccce30 in ._ZN2v88internal4Heap12ReserveSpaceEPNS0_4ListINS1_5ChunkENS0_25FreeStoreAllocationPolicyEEEPNS2_IPhS4_EE ()
#7  0x000000001104a598 in ._ZN2v88internal12Deserializer11DeserializeEPNS0_7IsolateE ()
#8  0x0000000010dcea9c in ._ZN2v88internal7Isolate4InitEPNS0_12DeserializerE ()
#9  0x00000000110560b4 in ._ZN2v88internal8Snapshot10InitializeEPNS0_7IsolateE ()
#10 0x000000001075cca8 in ._ZN2v87Isolate3NewERKNS0_12CreateParamsE ()
#11 0x00000000111f6f04 in ._ZN4node5StartEP9uv_loop_siPKPKciS5_ ()
#12 0x00000000111f650c in ._ZN4node5StartEiPPc ()
#13 0x000000001052b720 in .main ()
(gdb) info registers
r0             0x1  1
r1             0x3fffffffda00   70368744167936
r2             0x3fffb7be4410   70367531910160
r3             0xf  15
r4             0x0  0
r5             0x7ff    2047
r6             0x0  0
r7             0x30 48
r8             0x40 64
r9             0x400    1024
r10            0xf  15
r11            0x7  7
r12            0x800    2048
r13            0x3fffb7ffe190   70367536210320
r14            0x3fffffffe400   70368744170496
r15            0xffffffffba2e8ba3   18446744072538196899
r16            0x11fa4ab0   301615792
r17            0x7a940  502080
r18            0x7a940  502080
r19            0x0  0
r20            0x0  0
r21            0x1  1
r22            0x11f63a80   301349504
r23            0x2  2
r24            0x11f51e0e   301276686
r25            0x0  0
r26            0x0  0
r27            0x56f50  356176
r28            0x0  0
r29            0x11f63aa0   301349536
r30            0x11e1ce00   300011008
r31            0x3fffffffda00   70368744167936
pc             0x3fffb7aaa4e0   0x3fffb7aaa4e0 <.__memset_power7+64>
msr            0x800000010000d032   9223372041149796402
cr             0x44044841   1141131329
lr             0x10f69094   0x10f69094 <._ZN2v88internal18RegExpResultsCache5ClearEPNS0_10FixedArrayE+36>
ctr            0x3fffb7aaa4a0   70367530624160
xer            0x0  0
orig_r3        0xc00000000000908c   -4611686018427350900
trap           0x300    768
(gdb) disassemble
Dump of assembler code for function .__memset_power7:
   0x00003fffb7aaa4a0 <+0>: cmpldi  cr7,r5,31
   0x00003fffb7aaa4a4 <+4>: cmpldi  cr6,r5,8
   0x00003fffb7aaa4a8 <+8>: mr      r10,r3
   0x00003fffb7aaa4ac <+12>:    rlwimi  r4,r4,8,16,23
   0x00003fffb7aaa4b0 <+16>:    rlwimi  r4,r4,16,0,15
   0x00003fffb7aaa4b4 <+20>:    ble     cr6,0x3fffb7aaa830 <.__memset_power7+912>
   0x00003fffb7aaa4b8 <+24>:    neg     r0,r3
   0x00003fffb7aaa4bc <+28>:    ble     cr7,0x3fffb7aaa7a0 <.__memset_power7+768>
   0x00003fffb7aaa4c0 <+32>:    andi.   r11,r10,7
   0x00003fffb7aaa4c4 <+36>:    rldimi  r4,r4,32,0
   0x00003fffb7aaa4c8 <+40>:    mr      r12,r5
   0x00003fffb7aaa4cc <+44>:    beq     0x3fffb7aaa500 <.__memset_power7+96>
   0x00003fffb7aaa4d0 <+48>:    clrldi  r0,r0,61
   0x00003fffb7aaa4d4 <+52>:    mtocrf  1,r0
   0x00003fffb7aaa4d8 <+56>:    subf    r5,r0,r5
   0x00003fffb7aaa4dc <+60>:    bns     cr7,0x3fffb7aaa4e8 <.__memset_power7+72>
=> 0x00003fffb7aaa4e0 <+64>:    stb     r4,0(r10)
   0x00003fffb7aaa4e4 <+68>:    addi    r10,r10,1
   0x00003fffb7aaa4e8 <+72>:    bne     cr7,0x3fffb7aaa4f4 <.__memset_power7+84>
   0x00003fffb7aaa4ec <+76>:    sth     r4,0(r10)
   0x00003fffb7aaa4f0 <+80>:    addi    r10,r10,2
   0x00003fffb7aaa4f4 <+84>:    ble     cr7,0x3fffb7aaa500 <.__memset_power7+96>
   0x00003fffb7aaa4f8 <+88>:    stw     r4,0(r10)
   0x00003fffb7aaa4fc <+92>:    addi    r10,r10,4
   0x00003fffb7aaa500 <+96>:    cmpldi  cr5,r5,255
   0x00003fffb7aaa504 <+100>:   li      r0,32
   0x00003fffb7aaa508 <+104>:   dcbtst  0,r10
   0x00003fffb7aaa50c <+108>:   cmpldi  cr6,r4,0
   0x00003fffb7aaa510 <+112>:   rldicl  r9,r5,61,3
   0x00003fffb7aaa514 <+116>:   crand   4*cr6+so,4*cr6+eq,4*cr5+gt
   0x00003fffb7aaa518 <+120>:   mtocrf  1,r9
   0x00003fffb7aaa51c <+124>:   bso     cr6,0x3fffb7aaa5e0 <.__memset_power7+320>
   0x00003fffb7aaa520 <+128>:   rldicl  r8,r5,59,5
   0x00003fffb7aaa524 <+132>:   clrldi  r11,r5,61
   0x00003fffb7aaa528 <+136>:   cmpldi  cr6,r11,0
   0x00003fffb7aaa52c <+140>:   cmpldi  cr1,r9,4
   0x00003fffb7aaa530 <+144>:   mtctr   r8
   0x00003fffb7aaa534 <+148>:   bne     cr7,0x3fffb7aaa560 <.__memset_power7+192>
   0x00003fffb7aaa538 <+152>:   std     r4,0(r10)
   0x00003fffb7aaa53c <+156>:   std     r4,8(r10)
   0x00003fffb7aaa540 <+160>:   addi    r10,r10,16
   0x00003fffb7aaa544 <+164>:   bns     cr7,0x3fffb7aaa570 <.__memset_power7+208>
   0x00003fffb7aaa548 <+168>:   std     r4,0(r10)
   0x00003fffb7aaa54c <+172>:   addi    r10,r10,8
   0x00003fffb7aaa550 <+176>:   mr      r12,r10
   0x00003fffb7aaa554 <+180>:   blt     cr1,0x3fffb7aaa5b0 <.__memset_power7+272>
   0x00003fffb7aaa558 <+184>:   b       0x3fffb7aaa570 <.__memset_power7+208>
   0x00003fffb7aaa55c <+188>:   ori     r2,r2,0
   0x00003fffb7aaa560 <+192>:   bns     cr7,0x3fffb7aaa570 <.__memset_power7+208>
   0x00003fffb7aaa564 <+196>:   std     r4,0(r10)
   0x00003fffb7aaa568 <+200>:   addi    r10,r10,8
   0x00003fffb7aaa56c <+204>:   ori     r2,r2,0
   0x00003fffb7aaa570 <+208>:   addi    r12,r10,32
   0x00003fffb7aaa574 <+212>:   std     r4,0(r10)
   0x00003fffb7aaa578 <+216>:   std     r4,8(r10)
   0x00003fffb7aaa57c <+220>:   std     r4,16(r10)
   0x00003fffb7aaa580 <+224>:   std     r4,24(r10)
   0x00003fffb7aaa584 <+228>:   bdz     0x3fffb7aaa5b0 <.__memset_power7+272>
bnoordhuis commented 7 years ago

Looks like an almost-nullptr bug. It tries to store a byte at the address in r10, which is 0xf:

0x00003fffb7aaa4e0 <+64>: stb r4,0(r10) # r10 == 0xf

r3 (first function argument) is moved into r10 a few lines up so it would seem Heap::MarkCompactPrologue() is calling RegExpResultsCache::Clear() with a bad FixedArray pointer. That's about all I can glean from it though, the root cause is probably elsewhere.

mhdawson commented 7 years ago

The compiler version on the test/build boxes is:

root@test-osuosl-ubuntu14-ppc64-be-3:~# gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
mhdawson commented 7 years ago

The binaries seem to run ok on the machines on which they were built:

iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ uname -a
Linux test-osuosl-ubuntu14-ppc64-be-3 4.2.0-27-powerpc64-smp #32~14.04.1-Ubuntu SMP Fri Jan 22 15:47:25 UTC 2016 ppc64 ppc64 ppc64 GNU/Linux
mhdawson commented 7 years ago
iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ node -e 'console.log("hi")';
hi
iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ node --version
v7.7.3-nightly20170309c62798034a
iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ node -e 'console.log("hi")';
hi
iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$
mhdawson commented 7 years ago

output just before crash with LD_DEBUG=all

    23377:     symbol=munmap;  lookup in file=/lib64/power8/libc.so.6 [0]
     23377:     binding file ./node [0] to /lib64/power8/libc.so.6 [0]: normal symbol `munmap' [GLIBC_2.3]
     23377:     symbol=mprotect;  lookup in file=./node [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/libdl.so.2 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/librt.so.1 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/libstdc++.so.6 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/libm.so.6 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/libgcc_s.so.1 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/libpthread.so.0 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/libc.so.6 [0]
     23377:     binding file ./node [0] to /lib64/power8/libc.so.6 [0]: normal symbol `mprotect' [GLIBC_2.3]
Segmentation fault (core dumped)
mhdawson commented 7 years ago

On RHEL machine:

-sh-4.2$ /lib64/power8/libc.so.6
GNU C Library (GNU libc) stable release version 2.17, by Roland McGrath et al.
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.8.3 20140911 (Red Hat 4.8.3-7).
Compiled on a Linux 3.10.0 system on 2015-01-19.
Available extensions:
        The C stubs add-on version 2.1.2.
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
        RT using linux kernel aio
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

On community machine

iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ /lib64/libc.so.6
GNU C Library (Ubuntu EGLIBC 2.19-0ubuntu6.9) stable release version 2.19, by Roland McGrath et al.
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.8.4.
Compiled on a Linux 3.13.11 system on 2016-05-26.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/eglibc/+bugs>.
mhdawson commented 7 years ago

I wonder if its the glibc version. I believe we had run the community binaries on our RHEL 7 machines in the past, but possibly node is now using something before that is not compatible across glibc versions 2.17 and 2.19

mhdawson commented 7 years ago

@sxa555 do you have an environment where you can install a newer glibc on RHEL 7 and see if that makes a difference ?

mhdawson commented 7 years ago

Updated title to indicate crash is only on RHEL 7 as binaries seem to run on on ubuntu 14 BE.

richardlau commented 7 years ago

We're able to run v7.50 binaries, but not v7.6.0 and later:

-bash-4.2$ node-v7.5.0-linux-ppc64/bin/node
> .exit
-bash-4.2$ node-v7.6.0-linux-ppc64/bin/node
Segmentation fault (core dumped)
-bash-4.2$
bnoordhuis commented 7 years ago

The V8 5.4 -> 5.5 upgrade in 61870b4 seems like the most obvious culprit. Can you check if that commit fails and the preceding commit works?

richardlau commented 7 years ago

For the 8.0.0 nightlies (from https://nodejs.org/download/nightly/):

-bash-4.2$ node-v8.0.0-nightly20170126a67a04d765-linux-ppc64/bin/node
> .exit
-bash-4.2$ node-v8.0.0-nightly20170127b19334e566-linux-ppc64/bin/node
Segmentation fault (core dumped)
-bash-4.2$
richardlau commented 7 years ago

I guess this is also pointing to the V8 5.4->5.5 update:

-bash-4.2$ git log a67a04d765..b19334e566 --oneline
b19334e test: expand test coverage of fs.js
bee83e0 test: expand test coverage of events.js
e71c278 url: stop exporting originFor()
ad6e778 benchmark: add benchmark for object properties
084acc8 test: check noAssert option in buf.write*()
24ef1e6 string_decoder: align UTF-8 handling with V8
007386e repl: remove workaround for function redefinition
c2c6ae5 test: move test-vm-function-redefinition to parallel
b37f55a deps: limit regress/regress-crbug-514081 v8 test
91ab09f src: update NODE_MODULE_VERSION to 52
2739185 deps: update V8 to 5.5.372.40
-bash-4.2$
richardlau commented 7 years ago

The V8 5.4 -> 5.5 upgrade in 61870b4 seems like the most obvious culprit. Can you check if that commit fails and the preceding commit works?

node works if we compile and run locally -- The failures are with the binaries from nodejs.org running locally.

gibfahn commented 7 years ago

The gcc version for the test-osuosl-ubuntu14-ppc64_be_1 machine (should be the same as the release machines):

test-osuosl-ubuntu14-ppc64-be-3:~$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
sxa commented 7 years ago

RHEL 7.2 suffers the same symptoms - it is supplied with gcc/g++ 4.8.5-4 - even later than the Ubuntu 14.04 one, which suggests it's either a compiler bug specific to Ubuntu''s specific gcc version (I'm thinking unlikely but not impossible) or more likely the new V8 is triggering something that is using some functionality from glibc later than 2.17 (RHEL7 has 2.17, Ubu14.04 has 2.19)

Going the other way round, node7 binaries built on RHEL7 appear to run ok on Ubuntu 14.04 - possibly because it's built against an earlier glibc) so I wonder if a CentOS build machine (same as x64?) might be a better choice than Ubuntu 14.04. For the record, we have built with Ubuntu 14.04.1 on PPC-LE (Note: the BE community machines are 14.04.5) and that appears to run ok on RHEL7.

(I've also tried building my own glibc 2.19 on RHEL7 but that didn't execute properly with anything on the system)

sxa commented 7 years ago

I've got my own clean Ubuntu 14.04 now that I can experiment with and replicates the (lack of) problem on that platform. That at least confirms it's not anything magic on the CI machines that's making it work ;-)

sxa commented 7 years ago

Have tried building my own gcc/g++ (version 4.8.5) and that still causes a crash in the same place when run on RHEL7, so whatever we're seeing isn't an issue specific to Ubuntu's compiler.

Updating glibc on the RHEL7 box is "non-trivial" so I can't really recommend such a course of action (needs the dynamic loader and other stuff updated)

sxa commented 7 years ago

It's the Clear function at the end of https://github.com/nodejs/node/blob/v7.x/deps/v8/src/regexp/jsregexp.cc that's causing the memset call, which is invoked from line 1472 of https://github.com/nodejs/node/blob/v7.x/deps/v8/src/heap/heap.cc.

sxa commented 7 years ago

For reference: I did also try using the headers from glibc 2.17 (the RHEL version) on the ubuntu 14.04 build system but that still seemed to cause a crash in the same place.

gibfahn commented 7 years ago

@bnoordhuis quick ping in case you have any more suggestions, otherwise it looks like there's not much we can do here.

bnoordhuis commented 7 years ago

No other suggestions. I'll go ahead and close it out.

mhdawson commented 7 years ago

An interesting link that tracks abi compatibility between versions: https://abi-laboratory.pro/tracker/timeline/glibc/

incompatible changes between 2.17 and 2.18 are related to

2.19 shows as 100% compatible with 2.17

Its not obvious from the description of the crash that it is related to either of those two things,