Closed amcasey closed 5 months ago
DefinitelyTyped is one possible source of benchmarks, though checking declarations tends not to be representative of checking references.
We have user tests, why aren't those benchmarked?
Overload resolution, in particular errors in overload resolution, may or may not be tested in the current suite. There's no way to know! It would be nice to how much particular feature is actually exercised when testing for performance, especially if it's not exercised at all.
@DanielRosenwasser user tests also update to HEAD every time they're run. Their intent is to warn us when somebody breaks on new versions of Typescript, whether it's our fault or theirs (see: sift.js over the last month).
Edit: However, the list of tests is good starting point, although the tests are JS-centric so probably only a selection is really worth adding.
@DanielRosenwasser My best guess would be that they don't compile with the older TS builds in our (excessively long) time horizon.
@sandersn With a shorter time horizon, it doesn't matter if HEAD keeps moving, because we can just compute fresh baselines for old TS builds.
Features from the top of my head which need coverage:
keyof
react.d.ts
) which should implicitly include all of the above@types/assert
)We have user tests, why aren't those benchmarked?
Most of the user
suite is just a file with a require
of a module, so is just checking that the declaration files compile - not much is going to be exercised there. Plus, do you think we'd get reliable perf numbers running 2 of them in parallel in containers inside a cloud hosted containerized workflow? The docker
part of the suite is practically impossible to perf test (we don't even invoke anything close to tsc
and spend a ton of time on not-TS tasks in their builds), but the user
suite we could gather metrics for; it's just 1. are they reliable and reproducible, and 2. are they actionable?
I think the repro listed in https://github.com/microsoft/TypeScript/issues/36567 would be a good perf suite entry because it's so easy to regress.
@weswigham Local experiments suggest variances are acceptable with 10-run averages. We might be able to get away with fewer - we'd have to experiment (3 wasn't enough).
If the regression bug comes with a TS commit and a test repo URL and commit, I would expect the slowdown to be locally reproducible (modulo OS differences).
Local experiments suggest variances are acceptable with 10-run averages. We might be able to get away with fewer - we'd have to experiment (3 wasn't enough).
Definitely can't just collect those metrics part and parcel with the existing user
test runs then - it'd take much too long. if we think any of them are high value as perf targets, we should lift/copy them into the perf suite.
Related to #44033.
I'm going to close this in favor of https://github.com/microsoft/typescript-benchmarking/issues/32 and related changes; there's a PR out that will get us a new xstate, and replacing the other benchmarks with other useful ones is in progress.
In an effort to make it possible to compare current performance to past performance, we've continued to run our performance tests against benchmarks compilable with four year old versions of the compiler. This depth of past baselines is probably less valuable than having coverage for widely-used modern features like union types. Arguably, we should move our baseline forward to at least the oldest version supported on DefinitelyTyped (currently, 2.7) or, even more aggressively, use a rolling window (e.g. 90 days) when checking for regressions.