microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.33k stars 12.4k forks source link

perf: `Node.getChildren` speed regression in ts 5.5 when used at very large scale #59101

Closed walkerdb closed 2 months ago

walkerdb commented 2 months ago

🔎 Search Terms

typescript 5.5 performance, slowdown, typescript-eslint

🕗 Version & Regression Information

⏯ Playground Link

No response

💻 Code

I've put together a repro in a forked version of the sentry monorepo here: https://github.com/walkerdb/sentry/pull/1. It's using the sentry repo only because it's a large public repo that uses typescript-eslint.

🙁 Actual behavior

When running typescript-eslint's type-aware lint rules in a large monorepo with typescript 5.5, we observe lint times between 1.3x to 3x worse than when running the exact same rules and the same config under typescript 5.4. The slowdown seems to be entirely coming from the monomorphized objects change in the ts 5.5 release.

🙂 Expected behavior

There should be no performance impact when moving from typescript 5.4 to 5.5.

Additional information about the issue

We've submitted a bug report to typescript-eslint but we're also reporting here since the slowdown can be pinpointed to a specific typescript change.

Also attaching some before/after update perf traces via 0x that show the extra time is coming largely from typescript internals. The highlighted boxes show the specific part of the process that has ballooned in runtime. I'm happy to share the actual traces with maintainers if they'd be useful in debugging.

ts 5.4 ts 5.5
Screenshot 2024-07-01 at 5 48 38 PM Screenshot 2024-07-01 at 6 04 48 PM
DanielRosenwasser commented 2 months ago

I didn't get that far on this today, but I believe this is from more time spent in the GC. If you try upping your --max-old-space-size to something way higher, let me know if that alleviates things.

As an aside, SO much time in your build is being spent on shims running within eslint-plugin-react.

Time dominated by calls in `findVariableByName`

While findVariableByName really should be optimized to not create an intermediate array with variablesInScope, is it possible that you are running on an undesirable input file?

jakebailey commented 2 months ago

This is the opposite of expected! Which walk function is that? I'm not familiar with that "walk" function. Traces would be helpful, but https://github.com/walkerdb/sentry/pull/1 is probably enough to give us a lead.

You may also consider running things via https://www.npmjs.com/package/pprof-it (what Daniel just used above) as it in my experience does a better job attributing time and memory allocations to specific lines of code.

jakebailey commented 2 months ago

Hm, my trace doesn't seem to show that particular code as to blame:

image

@walkerdb What tool did you use to find your result?

jakebailey commented 2 months ago

I timed this, and in 5.5, it's 100s, but 5.4 is 91s; not 2-3x slower from what I can tell...

jakebailey commented 2 months ago

A second run of 5.4 was 100s too. Not sure what's going on here. What version of Node are you using?

jakebailey commented 2 months ago

I was using Node 22; switching down to Node 18 shows a worse regression:

// 5.4.5
total time:  160.75s
user time:   212.16s
system time: 4.96s
CPU percent: 135%
max memory:  3946 MB

// 5.5.3
total time:  217.70s
user time:   292.33s
system time: 6.36s
CPU percent: 137%
max memory:  3995 MB

Not 2-3x but very much farther apart than Node 22.

Unfortunately I'm now going to be gone until next week, so this will just have to eat me up inside until then...

walkerdb commented 2 months ago

Hi @jakebailey! Sorry, I should have clarified a few things:

  1. the traces above were generated using 0x, but on our private monorepo, which has a 2.5x slowdown when running our typed lints under ts 5.5 instead of 5.4. I used sentry as a public repro but its slowdown was only ~1.3x on my machine.
  2. in our private monorepo we run lint with essentially unlimited max-old-space and also with higher max-semi-space since we run into gc issues with young gen / scavenge as well at our size.
  3. we are indeed running node 18! Likely will update to 22 by end of year but that's a major undertaking for us.

to @DanielRosenwasser I'm not a part of sentry, I was just using that repo as an example repro since ours is private.

walkerdb commented 2 months ago

the types in Notion's private monorepo can be quite complex, and I suspect are close to a pathological case for some typescript perf behavior, which may help explain the 2.5x slowdown we were observing. I'll see if I can hack running the same checks under node 22 to see if the timings change there much.

jakebailey commented 2 months ago

Ran Node 18, and the results are very different from https://github.com/microsoft/TypeScript/issues/59101#issuecomment-2201759831:

image

That is an incredible amount of GC time, along with the badness Daniel noted above. Given the difference between Node 18 and Node 22 on this being in that shim code, I'm not 100% certain that this repro is representative of the repo you're testing internally, unfortunately...

Would you be able to run pprof-it on your own codebase? Screenshots of flame graphs and such should not expose any details, but you can also run it with PPROF_SANITIZE=on to sanitize the output for sharing (as that will remove any paths from the profile).

walkerdb commented 2 months ago

@jakebailey I'll give pprof-it a try, ty!

walkerdb commented 2 months ago

here's my pprof result for typescript 5.5.2 on node 18.18. I'll also upload one for typescript 5.4 in a sec pprof-time-typescript-55.pb.gz pprof-heap-typescript-55.pb.gz

walkerdb commented 2 months ago

pprof results for the exact same run but using typescript 5.4.5 on node 18.18 . This is significantly faster, and matches our expected run time from prior to the 5.5 update pprof-time-typescript-54.pb.gz pprof-heap-typescript-54.pb.gz

walkerdb commented 2 months ago

last run, with typescript 5.5.2 and node 22.3.0. Running under a newer node version doesn't seem to actually be much faster in this case? pprof-time-typescript-55-node-22.pb.gz pprof-heap-typescript-55-node-22.pb.gz

jakebailey commented 2 months ago

In all of those profiles, nearly half the time is spent just getting/setting the node children via the WeakMap. Yikes. I'm not sure why this isn't observable outside of your internal code unless WeakMaps get noticably slower when they get massive.

May have to tack that prop back on again, but the prop not being on all nodes directly was one of the things we had to do to not blow up perf too much.

jakebailey commented 2 months ago

At least I assume it's the weak map itself; haven't quite figured out how to remap the paths in the profile to a real version. (I'm also on a plane, that doesn't help)

walkerdb commented 2 months ago

thanks so much @jakebailey and @DanielRosenwasser for digging in! Happy to help however I can, feel free to shoot me a message at walker@makenotion.com if there's any other info you could use / commands to run / code to share / etc that I can help with.

In the meantime though we can wait through the holiday week, hope you have a great vacation!

walkerdb commented 2 months ago

can confirm that the getChildren call in the trace maps to this code in the ts source https://github.com/microsoft/TypeScript/blob/ddf43cd0e0091057254b1fcfcaf880140dfb7ea1/src/services/services.ts#L445

here also are some pprof diff screenshots between the 5.4.5 run and the 5.5.2 run, generated with go tool pprof -http=:8080 --diff_base=pprof-time-typescript-54.pb.gz pprof-time-typescript-55.pb.gz

Screenshot 2024-07-02 at 10 47 25 AM Screenshot 2024-07-02 at 10 46 46 AM
walkerdb commented 2 months ago

Interesting also that the slower ts 5.5 run uses significantly less heap, 23gb for 5.4 vs just 14gb on 5.5. Most of the items in the screenshots are typescript-eslint functions that themselves call ts public methods. (screenshots from a similar diff view comparing 5.4 heap usage to 5.5 heap usage, with 5.4 as the baseline)

Screenshot 2024-07-02 at 11 04 29 AM Screenshot 2024-07-02 at 11 04 16 AM
DanielRosenwasser commented 2 months ago

@walkerdb any chance you can iterate and reduce this repro to see if there's a root cause beyond just scale?

For example, if you cut out half of the files in your program - even if there are build errors, do you see the same surprising amount of time in set/getNodeChilren? If so, could you keep trying that until it stops? I'd be curious to see if there's a specific set of files that causes it, or if it's just fundamentally something about the move to WeakMaps.

dmichon-msft commented 2 months ago

This is probably just scale and repeated reallocation of the WeakMap at larger and larger sizes. It might be worth providing a facility for these APIs to be handed a scoped cache object (or using the SourceFileLike ancestor as the cache object, where applicable), rather than using a single global WeakMap.

walkerdb commented 2 months ago

@DanielRosenwasser I just gave a code bisect a go to see if a specific module or type is causing problems. As far as I can tell it's unrelated to the specific code being parsed, but as @dmichon-msft just mentioned it seems related to scale -- past a certain line-of-code count getChildren gets progressively slower and slower.

Specifically, I'm noticing the getChildren function showing up in perf traces once our typed-lint runs start passing 6000 files processed (~1M LOC), and is definitely prominent once we hit ~1.5M LOC. In reality this is probably more accurately tied to AST nodes processed but I don't have quick access to that number.

Running the same checks on smaller subsets of the same set of files results in expected fast performance.

Also for additional context, normal typecheck time with tsc is unchanged for us in ts 5.5, it's just this use of the Node.getChildren function through typescript-eslint that has significantly slowed down.

walkerdb commented 2 months ago

Even this much information is great though. I think my team now has a clear path forward for us to make the 5.5 bump, just split our typed lint job into smaller subsets under the LOC perf limit and overall runtime shouldn't be a long pole in our CI.

DanielRosenwasser commented 2 months ago

So I created #59154 to experiment a bit with @dmichon-msft's suggestion. If you want to take a profile with it, you can install @typescript-deploys/pr-build@5.6.0-pr-59154-2 and let us know how that works:

npm install -D @typescript-deploys/pr-build@5.6.0-pr-59154-2

Note that this is an experiment and it's pretty risky perf-wise for the language service. This change means that populating the list of child nodes now needs a walk up to the source file itself. So no guarantees that we land it, but it could inform other work.

DanielRosenwasser commented 2 months ago

It could also be slower for you too in a different way. It means that simply requesting the children of the tree grows to something like $n \log(n)$ without knowing the the sourceFile (hopefully TS ESLint is passing that along).

walkerdb commented 2 months ago

@DanielRosenwasser looks like that change does resolve nearly all the perf difference for us between ts 5.4 and 5.5! Running our typed-lint suite with @typescript-deploys/pr-build@5.6.0-pr-59154-2 seems to match total runtime for our ts 5.4 runs. Profiles attached:

pprof-time-typescript-childrenCachedBySourceFile.pb.gz pprof-heap-typescript-childrenCachedBySourceFile.pb.gz

DanielRosenwasser commented 2 months ago

@walkerdb that's really great! I think we will discuss this internally and try to figure out if we ship this specific change, and if we can back-port it. One thing that would help us a lot is if you could try that build as your daily driver. In VS Code, you can configure a workspace version of TypeScript by following the docs here or just setting the typescript.tsdk option:

{
  "typescript.tsdk": "./node_modules/typescript/lib"
}
walkerdb commented 2 months ago

Can do, thank you for all your help! Is there any specific feedback that would help on the new version other than whether tsserver requests like getProgram, getSemanticDiagnostics, getCompletionsAtPosition etc still seem to work and are just as responsive as before? (we track perf metrics for each of them via a custom plugin so it should be pretty easy for us to tell whether there's been any major regression there)

jakebailey commented 2 months ago

we track perf metrics for each of them via a custom plugin so it should be pretty easy for us to tell whether there's been any major regression there

This sounds amazing, and I absolutely want to hear more about this. This is the first we've heard of someone tracking this to this extent so anything is awesome.

maschwenk commented 2 months ago

Had a huge typescript-eslint regression just like this in our very large monolith. Had a similar idea of bumping --max-old-space-size but that didn't work:

https://discord.com/channels/1026804805894672454/1254556633556713553/1255160613441769683

Setting my typescript dependency to "typescript": "npm:@typescript-deploys/pr-build@5.6.0-pr-59154-2", fixed the issue, and actually seems a bit faster than baseline!

DanielRosenwasser commented 2 months ago

Hey all, this is fixed in the main branch but I want it to prove itself out for a bit. If you use the nightly build tomorrow (especially in the editor https://marketplace.visualstudio.com/items?itemName=ms-vscode.vscode-typescript-next), that would go a long way.

If you are actually able to measure, we are specifically looking for differences in memory usage, changes in editor operation delays, as well as changes in variance in delays.

If things feel good, we can cherry-pick the change back into 5.5.

walkerdb commented 2 months ago

@DanielRosenwasser can do! I'll get a test group going internally tomorrow, will see if metrics move much. We do also track total tsserver memory usage, should be able to see if that moves much under the new version as well.

@jakebailey happy to chat more about our tsserver observability setup! Could talk sync / show a few dashboards if you're interested?

walkerdb commented 2 months ago

and thank you both again for moving so fast on this, much appreciated!

JoshuaKGoldberg commented 2 months ago

+1 from the typescript-eslint side, we really appreciate it! Thanks so much for the fast follow and (very interesting) deep dive everyone!

walkerdb commented 2 months ago

following up to say we haven't noticed any serious perf regressions in IDE usage so far. We haven't had the participation numbers to say much more than that though.

maschwenk commented 2 months ago

Likewise on stability in my testing in very large monorepo (with a very large monolith). Curious if there are plans on a 5.5 backport? Or still targeting 5.6?

jakebailey commented 2 months ago

The backport is open here: https://github.com/microsoft/TypeScript/pull/59211

DanielRosenwasser commented 2 months ago

TypeScript 5.5.4 should contain the fix - thanks for reporting everyone!