nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
104.27k stars 28.06k forks source link

Maglev on x64 causes segmentation fault while running TypeScript #52797

Open remcohaszing opened 2 weeks ago

remcohaszing commented 2 weeks ago

Version

v22.0.0

Platform

Linux vali 6.8.0-76060800daily20240311-generic #202403110203~1714077665~22.04~4c8e9a0 SMP PREEMPT_DYNAMIC Thu A x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

On Linux using Node.js 22:

git clone git@github.com:remcohaszing/typescript-bug-58369.git
cd typescript-bug-58369
npm ci
tsc

See also this failed GitHub action: https://github.com/remcohaszing/typescript-bug-58369/actions/runs/8899456400/job/24438867767

How often does it reproduce? Is there a required condition?

For this reproduction it’s reproduced consistently on Linux on both my machine and GitHub actions.

While troubleshooting by trimming down the content of node_modules/@types/mdast/index.d.ts, I got into a state where it seemed to happen randomly. The major factor is the 👉 emoji in a comment.

The error did not occur on macOS in the GitHub action, but it did happen consistently for @wooorm on their macbook.

The problem was not reproducible on Windows.

What is the expected behavior? Why is that the expected behavior?

No segmentation fault

What do you see instead?

Segmentation fault (core dumped)

Additional information

This was originally reported to TypeScript: https://github.com/microsoft/TypeScript/issues/58369. This issue contains more information.

This has coincidentally already been fixed for the upcoming TypeScript 5.5. Still, a segfault should not occur.

We were unable to make a smaller reproduction.

RedYetiDev commented 2 weeks ago

Hi! Could you possibly provide some example code to reproduce this? Preferably code that has been compiled into plain JS.

remcohaszing commented 2 weeks ago

I would absolutely love to. Unfortunately the bug is reproduced by running tsc. Neither I nor the TypeScript team have been able to make a small reproduction.

It was caused by https://github.com/microsoft/TypeScript/pull/53081 and fixed by https://github.com/microsoft/TypeScript/pull/58339 (unreleased yet).

RedYetiDev commented 2 weeks ago

AFAICT this doesn't seem like an issue with Node.js itself, but rather a compiler (such as tsc), but I'm no expert, and I'd love to get a second opinion.

remcohaszing commented 2 weeks ago

Sorry, I now see I forgot to provide a critical piece of information. This is a regression in Node.js 22.0.0. It wasn’t a problem before.

RedYetiDev commented 2 weeks ago

Ahh, okay, thank you.

mcollina commented 2 weeks ago

@targos I have a feel we rushed the V8 upgrades.

cc @nodejs/v8 @RafaelGSS

targos commented 2 weeks ago

It's probably related to V8, but I'm not sure waiting would have changed anything? We released v22.0.0 with the version of V8 that's in current Chrome.

targos commented 2 weeks ago

Seems specific to Linux or x64 as I cannot reproduce on ARM64 macOS.

targos commented 2 weeks ago

We also don't know which version of V8 introduced the bug (assuming it's in V8).

targos commented 2 weeks ago

So, it's specific to x64. I can reproduce with node-v22.0.0-darwin-x64 on Rosetta.

targos commented 2 weeks ago

I'm going to compile a debug build on one of the Hetzner machines to get a meaningful stack trace.

wooorm commented 2 weeks ago

I’m on macOS and repro it consistently btw

targos commented 2 weeks ago

@woorm ARM or Intel?

wooorm commented 2 weeks ago

It's probably related to V8,

The code that started/stopped crashing in TS had do to with indexing into strings. The string that TS was looking into starts/stops crashing when there is/isn’t an emoji.

One TS maintainer potentially saw the crash appear/disappear when adding a console.log somewhere. So it may be related to some optimization routine

wooorm commented 2 weeks ago

I’m on an 2.6 GHz 6-Core Intel Core i7, Sonoma 14.4.1 (23E224). I have a bunch of the GNU utils tho. So perhaps there could be something that doesn’t happen on mac normally, but my machine looks more like Linux.

RedYetiDev commented 2 weeks ago

Ignore the repro-exists tag, I didn't mean to add it, and it won't effect anything.

jakebailey commented 2 weeks ago

Just to give a side by side using https://github.com/remcohaszing/typescript-bug-58369:

$ grep -A8 'function scanJSDocCommentTextToken' ./node_modules/typescript/lib/tsc.js
  function scanJSDocCommentTextToken(inBackticks) {
    fullStartPos = tokenStart = pos;
    tokenFlags = 0 /* None */;
    if (pos >= end) {
      return token = 1 /* EndOfFileToken */;
    }
    for (let ch = text.charCodeAt(pos); pos < end && (!isLineBreak(ch) && ch !== 96 /* backtick */); ch = codePointAt(text, ++pos)) {
      if (!inBackticks) {
        if (ch === 123 /* openBrace */) {
$ node ./node_modules/typescript/lib/tsc.js
[1]    1090384 segmentation fault  node ./node_modules/typescript/lib/tsc.js

Now, add debugger to the loop (console.log works too but is loud):

$ grep -A8 'function scanJSDocCommentTextToken' ./node_modules/typescript/lib/tsc.js
  function scanJSDocCommentTextToken(inBackticks) {
    fullStartPos = tokenStart = pos;
    tokenFlags = 0 /* None */;
    if (pos >= end) {
      return token = 1 /* EndOfFileToken */;
    }
    for (let ch = text.charCodeAt(pos); pos < end && (!isLineBreak(ch) && ch !== 96 /* backtick */); ch = codePointAt(text, ++pos)) {
      debugger; // ADDED
      if (!inBackticks) {
$ node ./node_modules/typescript/lib/tsc.js

I have not been able to extract out a test which just calls the parser via the public API, nor by extracting this code and giving it the same inputs.

RedYetiDev commented 2 weeks ago

Weird, thanks for the information!

targos commented 2 weeks ago
(gdb) run node_modules/typescript/lib/tsc.js
Starting program: /home/iojs/tmp-targos/node/out/Debug/node node_modules/typescript/lib/tsc.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7a4f640 (LWP 18826)]
[New Thread 0x7ffff724e640 (LWP 18827)]
[New Thread 0x7ffff6a4d640 (LWP 18828)]
[New Thread 0x7ffff624c640 (LWP 18829)]
[New Thread 0x7ffff5a4b640 (LWP 18830)]
[New Thread 0x7ffff51a9640 (LWP 18831)]

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x00005554d878345d in ?? ()
(gdb) bt
#0  0x00005554d878345d in ?? ()
#1  0x00001967bd580c69 in ?? ()
#2  0x000000000000200e in ?? ()
#3  0x0000200e00000000 in ?? ()
#4  0x0000000000000000 in ?? ()

Perfect 🙃

jakebailey commented 2 weeks ago

Just to note it, you can also add noop(); instead of debugger; if you want a debugger-statement free crasher.

targos commented 2 weeks ago

This is due to the Maglev compiler. I confirm that ./configure --v8-disable-maglev fixes it.

Maglev was enabled in https://github.com/nodejs/node/pull/51360

/cc @kvakil

targos commented 2 weeks ago

/cc @victorgomes

targos commented 1 week ago

I'm rebuilding with ASan and GCC stack protection flags to see if it helps to pinpoint the issue

targos commented 1 week ago

GCC failed to build so I switched to clang. Here's a bit more information:

``` (lldb) run node_modules/typescript/lib/tsc.js Process 107011 launched: '/home/iojs/tmp-targos/node/out/Debug/node' (x86_64) Process 107011 stopped * thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0xd848) frame #0: 0x00005554dd0c345d -> 0x5554dd0c345d: movl 0xb(%rbx), %r14d 0x5554dd0c3461: incl %r12d 0x5554dd0c3464: cmpl %r14d, %r12d 0x5554dd0c3467: jge 0x5554dd0c3497 (lldb) bt * thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0xd848) * frame #0: 0x00005554dd0c345d frame #1: 0x00005554dd07990f frame #2: 0x00005554dd078120 frame #3: 0x00005554dd08f431 frame #4: 0x00005554dd08ede4 frame #5: 0x00005554dd08ff1a frame #6: 0x00005554dd090015 frame #7: 0x00005554dd090236 frame #8: 0x00005554dd0905bf frame #9: 0x00005554dd08c36a frame #10: 0x00005554dd08538e frame #11: 0x00005554fcdd0d5e frame #12: 0x00005554dd08fa47 frame #13: 0x00005554dd0860d4 frame #14: 0x00005554dd087c7b frame #15: 0x00005554dd0670dd frame #16: 0x00005554dd08569f frame #17: 0x00005554fcdd0d5e frame #18: 0x00005554fcdd0d5e frame #19: 0x00005554fcdd0d5e frame #20: 0x00005554fcdd0d5e frame #21: 0x00005554fcdd0d5e frame #22: 0x00005554fcdd0d5e frame #23: 0x00005554fcdd0d5e frame #24: 0x00005554fcdd0d5e frame #25: 0x00005554fcdd0d5e frame #26: 0x00005554fcdd0d5e frame #27: 0x00005554fcdd0d5e frame #28: 0x00005554fcdd0d5e frame #29: 0x00005554fcdd0d5e frame #30: 0x00005554fcdd0d5e frame #31: 0x00005554dd04b3d3 frame #32: 0x00005554fcdd0d5e frame #33: 0x00005554fcdd0d5e frame #34: 0x00005554fcdd0d5e frame #35: 0x00005554fcdd0d5e frame #36: 0x00005554fcdd0d5e frame #37: 0x00005554fcdd0d5e frame #38: 0x00005554fcdd0d5e frame #39: 0x00005554fcdd0d5e frame #40: 0x00005554fcdd0d5e frame #41: 0x00005554fcdd0d5e frame #42: 0x00005554fcdd0d5e frame #43: 0x00005554fcdce05c frame #44: 0x00005554fcdcdd83 frame #45: 0x0000555559b6a5dc node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [inlined] v8::internal::GeneratedCode::Call(this=, args=, args=, args=, args=, args=, args=) at simulator.h:178:12 frame #46: 0x0000555559b6a5d9 node`v8::internal::(anonymous namespace)::Invoke(isolate=, params=)::InvokeParams const&) at execution.cc:418:22 frame #47: 0x0000555559b68766 node`v8::internal::Execution::Call(isolate=, callable=, receiver=, argc=, argv=) at execution.cc:504:10 frame #48: 0x00005555593dc7d7 node`v8::Function::Call(this=, context=, recv=, argc=, argv=) at api.cc:5485:7 frame #49: 0x000055555857e144 node`node::builtins::BuiltinLoader::CompileAndCall(this=0x000061f0000027a0, context=Local @ 0x00007fffffffc360, id="internal/main/run_main_module", argc=4, argv=0x0000603000060d30, optional_realm=0x0000618000000080) at node_builtins.cc:490:14 frame #50: 0x000055555857db08 node`node::builtins::BuiltinLoader::CompileAndCall(this=0x000061f0000027a0, context=Local @ 0x00007fffffffc620, id="internal/main/run_main_module", realm=0x0000618000000080) at node_builtins.cc:474:10 frame #51: 0x0000555558a1d04a node`node::Realm::ExecuteBootstrapper(this=0x0000618000000080, id="internal/main/run_main_module") at node_realm.cc:161:32 frame #52: 0x00005555584af033 node`node::StartExecution(env=0x000061f000001c80, main_script_id="internal/main/run_main_module") at node.cc:237:35 frame #53: 0x00005555584aeb92 node`node::StartExecution(env=0x000061f000001c80, cb=node::StartExecutionCallback @ 0x00007fffffffd900)>) at node.cc:365:12 frame #54: 0x0000555558083dd6 node`node::LoadEnvironment(env=0x000061f000001c80, cb=node::StartExecutionCallback @ 0x00007fffffffdb80, preload=node::EmbedderPreloadCallback @ 0x00007fffffffdbc0)>, std::function, v8::Local)>) at environment.cc:551:10 frame #55: 0x000055555886bd56 node`node::NodeMainInstance::Run(this=0x00007fffffffe170, exit_code=0x00007fffffffde80, env=0x000061f000001c80) at node_main_instance.cc:120:7 frame #56: 0x000055555886adc0 node`node::NodeMainInstance::Run(this=0x00007fffffffe170) at node_main_instance.cc:100:3 frame #57: 0x00005555584b60a9 node`node::StartInternal(argc=2, argv=0x000060b000002da0) at node.cc:1445:24 frame #58: 0x00005555584b5766 node`node::Start(argc=2, argv=0x00007fffffffe608) at node.cc:1452:27 frame #59: 0x000055555d969352 node`main(argc=2, argv=0x00007fffffffe608) at node_main.cc:97:10 frame #60: 0x00007ffff7a75d90 libc.so.6`__libc_start_call_main(main=(node`main at node_main.cc:96), argc=2, argv=0x00007fffffffe608) at libc_start_call_main.h:58:16 frame #61: 0x00007ffff7a75e40 libc.so.6`__libc_start_main_impl(main=(node`main at node_main.cc:96), argc=2, argv=0x00007fffffffe608, init=0x00007ffff7ffd040, fini=, rtld_fini=, stack_end=0x00007fffffffe5f8) at libc-start.c:392:3 frame #62: 0x0000555557ed2a35 node`_start + 37 (lldb) register read General Purpose Registers: rax = 0x00007ea6937000d9 rbx = 0x000000000000d83d rcx = 0x000000000000000a rdx = 0x000000000000000d rdi = 0x00007ee68b891811 rsi = 0x0000000000002028 rbp = 0x00007fffffff94f0 rsp = 0x00007fffffff94a8 r8 = 0x0000000000000060 r9 = 0x000000000000007b r10 = 0x0000000000000000 r11 = 0x0000000000000040 r12 = 0x000000000000200f r13 = 0x0000631000015080 r14 = 0x000000000000d800 r15 = 0x0000200e00000000 rip = 0x00005554dd0c345d rflags = 0x0000000000010246 cs = 0x0000000000000033 fs = 0x0000000000000000 gs = 0x0000000000000000 ss = 0x000000000000002b ds = 0x0000000000000000 es = 0x0000000000000000 ```

I don't know what else to do at this point. Happy to run more commands if you have any idea.

targos commented 1 week ago

--no-maglev-inlining also fixes it.

targos commented 1 week ago

I also tried the repro with:

They all segfault so I don't think it's a V8 regression, but really a Maglev inlining bug.

targos commented 1 week ago

I submitted a V8 bug report: https://issues.chromium.org/issues/338535750

targos commented 1 week ago

It looks very similar to https://issues.chromium.org/issues/42204637

tannal commented 1 week ago

Well, there is a v8 option called --print-opt-source which can print source code of optimized and inlined functions. The problem occurs when v8 optimizes the isLineBreak js function in tsc.js. I changed the function to this and the problem is gone.

// tsc.js
function isLineBreak(ch) {
  return ch === 10 /* lineFeed */ || ch === 13 /* carriageReturn */ ;
}
gdb --args ~/tannalwork/projects/node/node_g --print-opt-source ./node_modules/typescript/lib/tsc.js

--- FUNCTION SOURCE (/home/tannal/tannalwork/projects/node/out/typescript-bug-58369/node_modules/.pnpm/typescript@5.4.5/node_modules/typescript/lib/tsc.js:isLineBreak) id{16,-1} start{744101} ---
(ch) {
  return ch === 10 /* lineFeed */ || ch === 13 /* carriageReturn */ || ch === 8232 /* lineSeparator */ || ch === 8233 /* paragraphSeparator */;
}
--- END ---

Thread 1 "node_g" received signal SIGSEGV, Segmentation fault.
0x00005554dbb1115d in ?? ()
victorgomes commented 1 week ago

@targos Could you try to run the test with maglev and with pointer compression enabled?

targos commented 1 week ago

@victorgomes It also crashes with https://unofficial-builds.nodejs.org/download/release/v22.1.0/node-v22.1.0-linux-x64-pointer-compression.tar.xz

rhysd commented 4 days ago

FWIW, I faced this issue on macOS 13 (Intel Mac) as well in my project. My Node.js version is v22.1.0 (installed via Homebrew). Here is a reproduction (sorry I don't have enough time to minimize it):

  1. Clone this branch: https://github.com/rhysd/remark-emoji/tree/node-issue-52797
  2. In the repository, run npm install and then npm run build
> remark-emoji@4.0.1 build
> tsc -p .

[1]    19446 segmentation fault  npm run build
RedYetiDev commented 4 days ago

@remcohaszing your issue also involved emojis, right?

remcohaszing commented 4 days ago

The original issue is TypeScript processing @types/mdast, which is used by remark plugins such as remark-emoji. So this is the exact same issue.