Closed remcohaszing closed 5 months ago
Also repro’d at https://github.com/syntax-tree/mdast-util-to-string/actions/runs/8894134799, https://github.com/syntax-tree/unist-util-visit-parents/actions/runs/8893972759, and https://github.com/syntax-tree/unist-util-visit/actions/runs/8894107813/job/24421655968, which are simpler.
The segment fault goes away by removing declaration: true
in tsconfig.
Particularly mdast-util-to-string
is a simple repo that I recommend looking at.
What these repos have in common is that they use (edit: .d.ts
files.mdast-util-to-string
doesn’t)
it doesn’t happen in https://github.com/syntax-tree/nlcst-to-string, which is similar, but revolved around a different ecosystem.
I am now guessing that there something in the types in node_modules
of these packages. Likely around micromark/mdast (aka markdown).
I narrowed it down by trimming mdast-util-to-string
to a repo with the following files:
// package.json
{
"dependencies": {
"@types/mdast": "^4.0.0",
"typescript": "^5.0.0"
}
}
// tsconfig.json
{
"compilerOptions": {
// This option doesn’t seem to matter
// "noEmit": true
}
}
// index.ts
import 'mdast'
This repro also causes the crash when running tsc
, not just tsc --build
.
I narrowed it down even further by removing parts from @types/mdast
. I trimmed it down to this:
/**
* How phrasing content is aligned
* ({@link https://drafts.csswg.org/css-text/ | [CSSTEXT]}).
*
* * `'left'`: See the
* {@link https://drafts.csswg.org/css-text/#valdef-text-align-left | left}
* value of the `text-align` CSS property
* * `'right'`: See the
* {@link https://drafts.csswg.org/css-text/#valdef-text-align-right | right}
* value of the `text-align` CSS property
* * `'center'`: See the
* {@link https://drafts.csswg.org/css-text/#valdef-text-align-center | center}
* value of the `text-align` CSS property
* * `null`: phrasing content is aligned as defined by the host environment
*
* Used in GFM tables.
*/
export type AlignType = "center" | "left" | "right" | null;
/**
* Internal relation from one node to another.
*
* Whether the value of `identifier` is expected to be a unique identifier or
* not depends on the type of node including the Association.
* An example of this is that they should be unique on {@link Definition},
* whereas multiple {@link LinkReference}s can be non-unique to be associated
* with one definition.
*/
export interface Association {
identifier: string;
/**
* Relation of association, in parsed form.
*
* `label` is a `string` value: it works just like `title` on {@link Link}
* or a `lang` on {@link Code}: character escapes and character references
* are parsed.
*
* It can match another node.
*/
label?: string | null | undefined;
}
export interface Reference {
}
/**
* Registry of all mdast nodes that can occur as children of {@link Root}.
*
* > 👉 **Note**: {@link Root} does not need to be an entire document.
* > it can also be a fragment.
*
* This interface can be augmented to register custom node types:
*
*
* For a union of all {@link Root} children, see {@link RootContent}.
*/
export interface RootContentMap {
}
This still causes a crash, but at this point, even removing random chunks from comments resolves the issue.
random ideas:
[
and ]
in the link text ([CSSTEXT]
)?~Yep, only removing the emoji on L312 from node_modules/@types/mdast/index.d.ts
solves this for me in mdast-util-to-string
.
The emoji is U+1f449
, as in consisting of the surrogates 0xd83d
and 0xdc49
I tried removing the emoji. That did make a difference, but so did keeping the emoji and removing another random piece of comment instead.
Which other part?
Any piece really. I.e. the line below the emoji and the non-empty line below that.
Now tsc
is crashing randomly for me. I seem to have reached some treshold around this amount of content that’s allowed in the mdast types. The emoji doesn’t cause the problem, but it counts as a big factor towards reaching this treshold.
Removing the line below the emoji does not change anything for me in mdast-util-to-string
. Nor does the non empty line. Nor both together. For me the emoji is consistent
The TypeScript compiler is JavaScript code, which should never be able to cause a segfault (as that would likely be a security issue in e.g. browsers). This sound more like a Node and/or V8 bug to me.
The crash does not happen in TS 5.1.0-dev.20230307. But does occur with TS 5.1.0-dev.20230308.
Here’s the diff, which is too big to put in markdown. And I’m not allowed to use .diff
as an extension here.
Interesting from what I’m reading: looks like an introduction of scanJSDocCommentTextToken
, and new code relating to positional info?
The rest of the diff looks like the renumbering of the enums. And renaming functions from getStartPos
-> getTokenFullStart
; setTextPos
-> resetTokenState
.
some more interesting code at the end, around removeTrailingWhitespace
and pushComment
in I believe the tokenizer.
Idea: could Debug.assert
cause a crash on Node 22?
My point stands. tsc is pure JS code; if any version of the compiler can cause a segmentation fault that crashes the Node.js process, that's a potential security issue and should be brought up to the Node.js team.
Using Node 22 and the linked repo, I can't reproduce this segfault.
If you have this narrowed down to two specific versions, https://www.npmjs.com/package/every-ts would let you go further without knowing how to build TS itself (not that checking out and running TS is that hard). That diff is just too large to guess at, but an actual bisect would reveal things.
Since @wooorm and I have slightly different results, I suspect the hardware plays a role and you have a very beefy development machine.
I use Linux by the way. That may be relevant.
Based on a new project with only the files as described in https://github.com/microsoft/TypeScript/issues/58369#issuecomment-2085192574:
remco@vali:~/Downloads/diff $ npm i typescript @types/mdast
added 3 packages in 747ms
remco@vali:~/Downloads/diff $ tsc
[1] 95375 segmentation fault (core dumped) tsc
Exit code: 139
remco@vali:~/Downloads/diff $ npm uninstall typescript
removed 1 package in 223ms
remco@vali:~/Downloads/diff $ npm i every-ts
added 28 packages in 2s
remco@vali:~/Downloads/diff $ every-ts bisect start
Cloning TypeScript; this may take a bit...
Cloning into '/home/remco/Downloads/diff/node_modules/every-ts/.data/TypeScript'...
remote: Enumerating objects: 284974, done.
remote: Counting objects: 100% (313/313), done.
remote: Compressing objects: 100% (213/213), done.
remote: Total 284974 (delta 165), reused 202 (delta 80), pack-reused 284661
Receiving objects: 100% (284974/284974), 1.00 GiB | 11.74 MiB/s, done.
Resolving deltas: 100% (170765/170765), done.
remote: Enumerating objects: 66225, done.
remote: Counting objects: 100% (40065/40065), done.
remote: Compressing objects: 100% (30058/30058), done.
remote: Total 66225 (delta 10944), reused 10037 (delta 10007), pack-reused 26160
Receiving objects: 100% (66225/66225), 29.73 MiB | 7.86 MiB/s, done.
Resolving deltas: 100% (16170/16170), done.
Updating files: 100% (71112/71112), done.
status: waiting for both good and bad commits
Building TypeScript...
Downloading fnm...
fnm installed
TypeScript built successfully!
remco@vali:~/Downloads/diff $ every-ts bisect new 5.1.0-dev.20230308
Resolved 5.1.0-dev.20230308 to 746a6feb2e7ba6987b6c72db538dd498b35cd461
status: waiting for good commit(s), bad commit known
remco@vali:~/Downloads/diff $ every-ts bisect old 5.1.0-dev.20230307
Resolved 5.1.0-dev.20230307 to df1ddb79642212c7ef9f2f6ec03ac95e610d4280
Bisecting: 10 revisions left to test after this (roughly 4 steps)
remote: Enumerating objects: 42710, done.
remote: Counting objects: 100% (5147/5147), done.
remote: Compressing objects: 100% (2943/2943), done.
remote: Total 42710 (delta 2637), reused 2204 (delta 2204), pack-reused 37563
Receiving objects: 100% (42710/42710), 24.43 MiB | 6.27 MiB/s, done.
Resolving deltas: 100% (10862/10862), done.
[a6be79d535475c0062f3028afa51cabdb83d02ff] Remove old test262 and dt runner infra (#53125)
Building TypeScript...
TypeScript built successfully!
remco@vali:~/Downloads/diff $ every-ts bisect run tsc
Building TypeScript...
TypeScript built successfully!
git bisect old
Bisecting: 5 revisions left to test after this (roughly 3 steps)
remote: Enumerating objects: 865, done.
remote: Counting objects: 100% (812/812), done.
remote: Compressing objects: 100% (709/709), done.
remote: Total 865 (delta 105), reused 103 (delta 103), pack-reused 53
Receiving objects: 100% (865/865), 2.30 MiB | 4.25 MiB/s, done.
Resolving deltas: 100% (110/110), done.
[88adf8014b617fb8492587941ffcf4f75986e9cc] Make special intersections order-independent (#52782)
Building TypeScript...
TypeScript built successfully!
git bisect old
Bisecting: 2 revisions left to test after this (roughly 2 steps)
remote: Enumerating objects: 66, done.
remote: Counting objects: 100% (56/56), done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 66 (delta 27), reused 27 (delta 27), pack-reused 10
Receiving objects: 100% (66/66), 916.98 KiB | 3.60 MiB/s, done.
Resolving deltas: 100% (31/31), done.
[137c461bd096d0c0a948fda299e56316798df8eb] Scan bigger/fewer jsdoc tokens (#53081)
Building TypeScript...
TypeScript built successfully!
Internal Error: Command was killed with SIGSEGV (Segmentation fault): tsc
at makeError (file:///home/remco/Downloads/diff/node_modules/execa/lib/error.js:60:11)
at handlePromise (file:///home/remco/Downloads/diff/node_modules/execa/index.js:124:26)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async BisectRun.executeOrThrow (file:///home/remco/Downloads/diff/node_modules/every-ts/dist/git.js:98:28)
at async BisectRun.execute (file:///home/remco/Downloads/diff/node_modules/every-ts/dist/common.js:50:20)
at async BisectRun.validateAndExecute (/home/remco/Downloads/diff/node_modules/clipanion/lib/advanced/Command.js:73:26)
at async Cli.run (/home/remco/Downloads/diff/node_modules/clipanion/lib/advanced/Cli.js:229:24)
at async Cli.runExit (/home/remco/Downloads/diff/node_modules/clipanion/lib/advanced/Cli.js:238:28)
Exit code: 1
So if I understand correctly the culprit is #53081.
Thanks for narrowing that down. Realistically, nothing in that PR should have been able to segfault node (or anything in JS at all, period), so if this is new in Node 22, it's definitely a Node bug if anything.
I wasn't sure how to test with your repro, though; if you can make a branch or new repo with that content, that'd be helpful for experimentation and reporting upstream.
Fair. I pushed the reproduction repo to https://github.com/remcohaszing/typescript-bug-58369.
Not reproing for me on Windows x64 Node 22.0.0 tsc 5.5.0-dev.20240430, potato laptop
After setting up everything again to make a nice reproduction, I see the behaviour @wooorm described: the 👉
emoji causes the crash, it’s not random anymore.
I have also setup a test matrix in GitHub actions which shows that this errors occurs in Linux. It can’t be reproduced on Windows, because npm shipped with Node.js 22 on Windows appears to be broken.
This is actually "fixed" on TS main by #58339; that PR prevents out of bounds accesses in more places, among other tweaks, so it sounds like Node 22 has a string handling bug.
I just confirmed using typescript@next
fixed it for me.
Do we have a standalone (not all of tsc) repro we can hand off to Node/V8?
I have nothing yet. At 3358157469d805df1deec01d26269c5d9d6b23ae, the segfault happens in scanJSDocCommentTextToken
, but if you add console log inside of the for loop, it stops segfaulting.
Sounds like some kind of JIT bug in V8; those are no fun. The console.log
probably causes a deopt that sidesteps it.
Even extracting the code out, I can't reproduce this with the same inputs to that function. At this point, I don't think there's any mimization that can be done and the only next steps are to bisect Node to see when it broke between v20 and v22 (and then when it's inevitably just a "pull in v8" change, bisect that too).
In any case, I don't think this is a problem that TS can handle; if the bug is "fixed" by adding an innocuous console log, that's a runtime problem.
This issue has been marked as "External" and has seen no recent activity. It has been automatically closed for house-keeping purposes.
🔎 Search Terms
"segmentation fault"
🕗 Version & Regression Information
⏯ Playground Link
https://github.com/remarkjs/remark/tree/main/packages/remark-parse
💻 Code
The code is not actually relevant. It still happens if you remove the content of
packages/remark-parse/lib/index.js
. The existence of the file is important.🙁 Actual behavior
Using Node.js 22, running
tsc --build
from theremark-parse
workspace, yields:🙂 Expected behavior
It generates type definitions
Additional information about the issue
It works in Node.js 20, but not Node.js 22.
If the project is built with Node.js 20, then an incremental build with Node.js 22 won’t crash TypeScript.
Neither
tsc
nortsc --showConfig
nottsc --build --clean
causes a crash, onlytsc --build
.See https://github.com/remarkjs/remark/pull/1291#issuecomment-2084920007 for some more info where this was first discovered.