microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.19k stars 12.38k forks source link

Update the performance test suite #36642

Closed amcasey closed 5 months ago

amcasey commented 4 years ago

In an effort to make it possible to compare current performance to past performance, we've continued to run our performance tests against benchmarks compilable with four year old versions of the compiler. This depth of past baselines is probably less valuable than having coverage for widely-used modern features like union types. Arguably, we should move our baseline forward to at least the oldest version supported on DefinitelyTyped (currently, 2.7) or, even more aggressively, use a rolling window (e.g. 90 days) when checking for regressions.

amcasey commented 4 years ago

DefinitelyTyped is one possible source of benchmarks, though checking declarations tends not to be representative of checking references.

DanielRosenwasser commented 4 years ago

We have user tests, why aren't those benchmarked?

sandersn commented 4 years ago

Overload resolution, in particular errors in overload resolution, may or may not be tested in the current suite. There's no way to know! It would be nice to how much particular feature is actually exercised when testing for performance, especially if it's not exercised at all.

@DanielRosenwasser user tests also update to HEAD every time they're run. Their intent is to warn us when somebody breaks on new versions of Typescript, whether it's our fault or theirs (see: sift.js over the last month).

Edit: However, the list of tests is good starting point, although the tests are JS-centric so probably only a selection is really worth adding.

amcasey commented 4 years ago

@DanielRosenwasser My best guess would be that they don't compile with the older TS builds in our (excessively long) time horizon.

amcasey commented 4 years ago

@sandersn With a shorter time horizon, it doesn't matter if HEAD keeps moving, because we can just compute fresh baselines for old TS builds.

weswigham commented 4 years ago

Features from the top of my head which need coverage:

We have user tests, why aren't those benchmarked?

Most of the user suite is just a file with a require of a module, so is just checking that the declaration files compile - not much is going to be exercised there. Plus, do you think we'd get reliable perf numbers running 2 of them in parallel in containers inside a cloud hosted containerized workflow? The docker part of the suite is practically impossible to perf test (we don't even invoke anything close to tsc and spend a ton of time on not-TS tasks in their builds), but the user suite we could gather metrics for; it's just 1. are they reliable and reproducible, and 2. are they actionable?

amcasey commented 4 years ago

I think the repro listed in https://github.com/microsoft/TypeScript/issues/36567 would be a good perf suite entry because it's so easy to regress.

sandersn commented 4 years ago
amcasey commented 4 years ago

@weswigham Local experiments suggest variances are acceptable with 10-run averages. We might be able to get away with fewer - we'd have to experiment (3 wasn't enough).

If the regression bug comes with a TS commit and a test repo URL and commit, I would expect the slowdown to be locally reproducible (modulo OS differences).

weswigham commented 4 years ago

Local experiments suggest variances are acceptable with 10-run averages. We might be able to get away with fewer - we'd have to experiment (3 wasn't enough).

Definitely can't just collect those metrics part and parcel with the existing user test runs then - it'd take much too long. if we think any of them are high value as perf targets, we should lift/copy them into the perf suite.

DanielRosenwasser commented 2 years ago

Related to #44033.

jakebailey commented 5 months ago

I'm going to close this in favor of https://github.com/microsoft/typescript-benchmarking/issues/32 and related changes; there's a PR out that will get us a new xstate, and replacing the other benchmarks with other useful ones is in progress.