nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.15k stars 29.37k forks source link

Segmentation fault in Node 20.9.0 in `SSL_select_next_proto` #50626

Closed gigobyte closed 11 months ago

gigobyte commented 11 months ago

Version

20.9.0

Platform

Linux Ubuntu-2004-focal-64-minimal-hwe 5.15.0-71-generic #78~20.04.1-Ubuntu SMP Wed Apr 19 11:26:48 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

It's hard to give concrete steps to reproduce this bug, I have an express.js server running that exits because of SIGSEGV once a day or so. Sorry that I can't be of much help.

How often does it reproduce? Is there a required condition?

This issue https://github.com/nodejs/node/issues/47207 seems to be related but it's reported as fixed in 20.8.0. I can reproduce this problem on all versions of Node 20 and the latest Node 18. The problem is not present on 18.10.0.

What is the expected behavior? Why is that the expected behavior?

No response

What do you see instead?

/root/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x37a5)[0x7fb11c57c7a5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fb11e88d420]
node /root/build/server.js(SSL_select_next_proto+0x4c)[0x1a40fdc]
node /root/build/server.js[0xe69d28]
node /root/build/server.js(tls_handle_alpn+0x53)[0x1a851d3]
node /root/build/server.js(tls_post_process_client_hello+0x54a)[0x1a8594a]
node /root/build/server.js[0x1a72b45]
node /root/build/server.js(ssl3_read_bytes+0x2e8)[0x1a60f28]
node /root/build/server.js(ssl3_read+0x65)[0x1a301e5]
node /root/build/server.js(SSL_read+0x8b)[0x1a3f12b]
node /root/build/server.js(_ZN4node6crypto7TLSWrap8ClearOutEv+0x78)[0xe6fc98]
node /root/build/server.js(_ZN4node6crypto7TLSWrap12OnStreamReadElRK8uv_buf_t+0xa0)[0xe71790]
node /root/build/server.js(_ZN4node15LibuvStreamWrap8OnUvReadElPK8uv_buf_t+0x95)[0xdb65b5]
node /root/build/server.js[0xdb69ea]
node /root/build/server.js[0x188d2bd]
node /root/build/server.js[0x188d650]
node /root/build/server.js[0x189502b]
node /root/build/server.js(uv_run+0x187)[0x1881387]
node /root/build/server.js(_ZN4node21SpinEventLoopInternalEPNS_11EnvironmentE+0x156)[0xbb3bd6]
node /root/build/server.js[0xce9fc5]
node /root/build/server.js(_ZN4node16NodeMainInstance3RunEv+0xcd)[0xcea98d]
node /root/build/server.js(_ZN4node5StartEiPPc+0x587)[0xc55687]

Additional information

No response

bnoordhuis commented 11 months ago

Can you get a disassembly from around the offending instruction? The other issue has info on how to obtain that.

gigobyte commented 11 months ago

@bnoordhuis Are you referring to this comment? I'm not familiar with gdb or debugging anything other than JavaScript code to be honest, do I need to run node with a specific flag to get debugging info that can be used with gdb or?

bnoordhuis commented 11 months ago

Yes, that comment and no, you don't need to turn on any flags, just make sure core dumps are enabled on your system.

gigobyte commented 11 months ago

I'm not seeing any segmentation faults even though it crashed, is this because I'm using segfault-handler? I tested with a manually triggered one killing sleep:

/var/lib/systemd/coredump # ls
core.sleep.0.03021e68d1f342ad8fbd1719a6816469.591667.1699690843000000000000.lz4

So everything is set up correctly.

bnoordhuis commented 11 months ago

Yes, it's probably because of segfault-handler. That module does some really questionable things in its signal handler, by the way. Not something I'd recommend running in production.

gigobyte commented 11 months ago

Let me know if this is enough information, I can also upload the .lz4 file if needed.

(gdb) disassemble
Dump of assembler code for function _ZN4node6crypto12_GLOBAL__N_118SelectALPNCallbackEP6ssl_stPPKhPhS5_jPv:
   0x0000000000e69cd0 <+0>:     push   %rbp
   0x0000000000e69cd1 <+1>:     mov    %rsp,%rbp
   0x0000000000e69cd4 <+4>:     push   %r15
   0x0000000000e69cd6 <+6>:     mov    %r8d,%r15d
   0x0000000000e69cd9 <+9>:     push   %r14
   0x0000000000e69cdb <+11>:    mov    %rdx,%r14
   0x0000000000e69cde <+14>:    push   %r13
   0x0000000000e69ce0 <+16>:    mov    %rsi,%r13
   0x0000000000e69ce3 <+19>:    push   %r12
   0x0000000000e69ce5 <+21>:    mov    %r9,%r12
   0x0000000000e69ce8 <+24>:    push   %rbx
   0x0000000000e69ce9 <+25>:    mov    %rcx,%rbx
   0x0000000000e69cec <+28>:    sub    $0x48,%rsp
   0x0000000000e69cf0 <+32>:    cmpb   $0x0,0x1c8(%r9)
   0x0000000000e69cf8 <+40>:    jne    0xe69d48 <_ZN4node6crypto12_GLOBAL__N_118SelectALPNCallbackEP6ssl_stPPKhPhS5_jPv+120>
   0x0000000000e69cfa <+42>:    mov    0x1b8(%r9),%rcx
   0x0000000000e69d01 <+49>:    mov    0x1b0(%r9),%rdx
   0x0000000000e69d08 <+56>:    cmp    %rdx,%rcx
   0x0000000000e69d0b <+59>:    je     0xe69ef0 <_ZN4node6crypto12_GLOBAL__N_118SelectALPNCallbackEP6ssl_stPPKhPhS5_jPv+544>
   0x0000000000e69d11 <+65>:    sub    %rdx,%rcx
   0x0000000000e69d14 <+68>:    mov    %r8d,%r9d
   0x0000000000e69d17 <+71>:    mov    %r14,%rsi
   0x0000000000e69d1a <+74>:    mov    %rbx,%r8
   0x0000000000e69d1d <+77>:    mov    %r13,%rdi
   0x0000000000e69d20 <+80>:    xor    %r12d,%r12d
   0x0000000000e69d23 <+83>:    callq  0x1a40f90 <SSL_select_next_proto>
   0x0000000000e69d28 <+88>:    cmp    $0x1,%eax
   0x0000000000e69d2b <+91>:    setne  %r12b
   0x0000000000e69d2f <+95>:    add    %r12d,%r12d
   0x0000000000e69d32 <+98>:    add    $0x48,%rsp
   0x0000000000e69d36 <+102>:   mov    %r12d,%eax
   0x0000000000e69d39 <+105>:   pop    %rbx
   0x0000000000e69d3a <+106>:   pop    %r12
   0x0000000000e69d3c <+108>:   pop    %r13
   0x0000000000e69d3e <+110>:   pop    %r14
   0x0000000000e69d40 <+112>:   pop    %r15
   0x0000000000e69d42 <+114>:   pop    %rbp
   0x0000000000e69d43 <+115>:   retq   
(gdb) info registers
rax            0x0                 0
rbx            0x7c1ead0           130149072
rcx            0x7c1ead0           130149072
rdx            0x7ffcbcf94f17      140723478941463
rsi            0x7ffcbcf94f18      140723478941464
rdi            0x7ffcbcf94eb0      140723478941360
rbp            0x7ffcbcf94f00      0x7ffcbcf94f00
rsp            0x7ffcbcf94e90      0x7ffcbcf94e90
r8             0xc                 12
r9             0x8007c10           134249488
r10            0x0                 0
r11            0x7242090           119808144
r12            0x8007c10           134249488
r13            0x7ffcbcf94f18      140723478941464
r14            0x7ffcbcf94f17      140723478941463
r15            0xc                 12
rip            0xe69d50            0xe69d50 <node::crypto::(anonymous namespace)::SelectALPNCallback(ssl_st*, unsigned char const**, unsigned char*, unsigned char const*, unsigned int, void*)+128>
eflags         0x10202             [ IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
bnoordhuis commented 11 months ago

That's the bug from #47207. Somewhat to my surprise commit 1643adf771dafce8034a00faacf98a2e57d5eebc doesn't actually seem to have been merged into v20.x; instead it's scheduled for v20.10.0 which is in the process of being released, see #50682.

It's kind of odd that people reported it as being fixed but maybe that's because it shows up so infrequently :shrug:

Anyway, I'll go ahead and close this as a duplicate. The problem should go away when you upgrade.