Closed gigobyte closed 11 months ago
Can you get a disassembly from around the offending instruction? The other issue has info on how to obtain that.
@bnoordhuis Are you referring to this comment? I'm not familiar with gdb or debugging anything other than JavaScript code to be honest, do I need to run node
with a specific flag to get debugging info that can be used with gdb or?
Yes, that comment and no, you don't need to turn on any flags, just make sure core dumps are enabled on your system.
I'm not seeing any segmentation faults even though it crashed, is this because I'm using segfault-handler
? I tested with a manually triggered one kill
ing sleep:
/var/lib/systemd/coredump # ls
core.sleep.0.03021e68d1f342ad8fbd1719a6816469.591667.1699690843000000000000.lz4
So everything is set up correctly.
Yes, it's probably because of segfault-handler. That module does some really questionable things in its signal handler, by the way. Not something I'd recommend running in production.
Let me know if this is enough information, I can also upload the .lz4
file if needed.
(gdb) disassemble
Dump of assembler code for function _ZN4node6crypto12_GLOBAL__N_118SelectALPNCallbackEP6ssl_stPPKhPhS5_jPv:
0x0000000000e69cd0 <+0>: push %rbp
0x0000000000e69cd1 <+1>: mov %rsp,%rbp
0x0000000000e69cd4 <+4>: push %r15
0x0000000000e69cd6 <+6>: mov %r8d,%r15d
0x0000000000e69cd9 <+9>: push %r14
0x0000000000e69cdb <+11>: mov %rdx,%r14
0x0000000000e69cde <+14>: push %r13
0x0000000000e69ce0 <+16>: mov %rsi,%r13
0x0000000000e69ce3 <+19>: push %r12
0x0000000000e69ce5 <+21>: mov %r9,%r12
0x0000000000e69ce8 <+24>: push %rbx
0x0000000000e69ce9 <+25>: mov %rcx,%rbx
0x0000000000e69cec <+28>: sub $0x48,%rsp
0x0000000000e69cf0 <+32>: cmpb $0x0,0x1c8(%r9)
0x0000000000e69cf8 <+40>: jne 0xe69d48 <_ZN4node6crypto12_GLOBAL__N_118SelectALPNCallbackEP6ssl_stPPKhPhS5_jPv+120>
0x0000000000e69cfa <+42>: mov 0x1b8(%r9),%rcx
0x0000000000e69d01 <+49>: mov 0x1b0(%r9),%rdx
0x0000000000e69d08 <+56>: cmp %rdx,%rcx
0x0000000000e69d0b <+59>: je 0xe69ef0 <_ZN4node6crypto12_GLOBAL__N_118SelectALPNCallbackEP6ssl_stPPKhPhS5_jPv+544>
0x0000000000e69d11 <+65>: sub %rdx,%rcx
0x0000000000e69d14 <+68>: mov %r8d,%r9d
0x0000000000e69d17 <+71>: mov %r14,%rsi
0x0000000000e69d1a <+74>: mov %rbx,%r8
0x0000000000e69d1d <+77>: mov %r13,%rdi
0x0000000000e69d20 <+80>: xor %r12d,%r12d
0x0000000000e69d23 <+83>: callq 0x1a40f90 <SSL_select_next_proto>
0x0000000000e69d28 <+88>: cmp $0x1,%eax
0x0000000000e69d2b <+91>: setne %r12b
0x0000000000e69d2f <+95>: add %r12d,%r12d
0x0000000000e69d32 <+98>: add $0x48,%rsp
0x0000000000e69d36 <+102>: mov %r12d,%eax
0x0000000000e69d39 <+105>: pop %rbx
0x0000000000e69d3a <+106>: pop %r12
0x0000000000e69d3c <+108>: pop %r13
0x0000000000e69d3e <+110>: pop %r14
0x0000000000e69d40 <+112>: pop %r15
0x0000000000e69d42 <+114>: pop %rbp
0x0000000000e69d43 <+115>: retq
(gdb) info registers
rax 0x0 0
rbx 0x7c1ead0 130149072
rcx 0x7c1ead0 130149072
rdx 0x7ffcbcf94f17 140723478941463
rsi 0x7ffcbcf94f18 140723478941464
rdi 0x7ffcbcf94eb0 140723478941360
rbp 0x7ffcbcf94f00 0x7ffcbcf94f00
rsp 0x7ffcbcf94e90 0x7ffcbcf94e90
r8 0xc 12
r9 0x8007c10 134249488
r10 0x0 0
r11 0x7242090 119808144
r12 0x8007c10 134249488
r13 0x7ffcbcf94f18 140723478941464
r14 0x7ffcbcf94f17 140723478941463
r15 0xc 12
rip 0xe69d50 0xe69d50 <node::crypto::(anonymous namespace)::SelectALPNCallback(ssl_st*, unsigned char const**, unsigned char*, unsigned char const*, unsigned int, void*)+128>
eflags 0x10202 [ IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
That's the bug from #47207. Somewhat to my surprise commit 1643adf771dafce8034a00faacf98a2e57d5eebc doesn't actually seem to have been merged into v20.x; instead it's scheduled for v20.10.0 which is in the process of being released, see #50682.
It's kind of odd that people reported it as being fixed but maybe that's because it shows up so infrequently :shrug:
Anyway, I'll go ahead and close this as a duplicate. The problem should go away when you upgrade.
Version
20.9.0
Platform
Linux Ubuntu-2004-focal-64-minimal-hwe 5.15.0-71-generic #78~20.04.1-Ubuntu SMP Wed Apr 19 11:26:48 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
It's hard to give concrete steps to reproduce this bug, I have an express.js server running that exits because of SIGSEGV once a day or so. Sorry that I can't be of much help.
How often does it reproduce? Is there a required condition?
This issue https://github.com/nodejs/node/issues/47207 seems to be related but it's reported as fixed in 20.8.0. I can reproduce this problem on all versions of Node 20 and the latest Node 18. The problem is not present on 18.10.0.
What is the expected behavior? Why is that the expected behavior?
No response
What do you see instead?
Additional information
No response