Open gzm0 opened 3 months ago
First of all, my only real-life experience with benchmarking is on Scala.js (Scala to JavaScript transpiler, I'm one of the maintainers).
So far, we have performed 3 types for benchmarks (I'll compare them to what we probably want here below):
We execute these benchmarks on-demand manually (as in: if we have a hunch it might affect performance, not even on release). We have not automated infrastructure to run them whatsoever.
The first benchmark bullet is probably the one most similar to what we'd want here, of course also the one I haven't been working on at all :shrug: .
I think one core difference between the Scala.js and the (current) openapi-fetch benchmarks is that the Scala.js benchmarks are CPU bound (they do little to no I/O). The openapi-fetch benchmarks as they are now, are likely heavily I/O bound: I do not know how msw
works under the hood and what parts of the network stack it abstracts, but it would not surprise me if it still needs to invoke quite a bit of it. As a result, it might very well be, that the openapi-fetch benchmarks currently mostly benchmark msw
. This could somewhat explain the high fluctuations.
I have a feeling that for better low-level performance analysis, we might need to split the benchmarks:
fetch
), they answer "how does this change affect performance" (w.r.t. to JIT and what not).I doubt it is feasible to build the second kind of benchmark for other frameworks. Maybe the ones that also wrap fetch
. But even then, we'd likely need to make assumptions about the specific usage of fetch, etc. (e.g. openapi-fetch
parses the JSON response as part of the invocation, if another framework doesn't do that, the comparison gets hard).
HTH. Happy to discuss further.
Love all these thoughts. I don’t disagree with any of it!
the openapi-fetch benchmarks currently mostly benchmark msw. This could somewhat explain the high fluctuations.
That is a very good observation that I hadn’t considered. you’re right, regardless, msw
shouldn’t be part of the benchmarks. fetch
should be stubbed in a way that is basically an async nextTick
for the purposes of benchmarking.
we might need to split the benchmarks:
Agree completely. Also agree on not bothering with monitoring low-level benchmarks on other libraries, but still having some (manually-run) way to check if any PR significantly impacts JIT performance.
Benchmarking other libraries I like doing for 2 purposes:
benchmarking other libraries helps us identify areas of improvement in our code
That's a very good point. I never thought about it that way. But indeed, that can give you an "achievable baseline".
Just removed MSW from the benchmarks and that indeed seemed to contribute heavily toward the fluctuations. I’m sure there will still be a little flaking, as is normal, but it seems better from what it was.
But without MSW we’ll have to mock the internals for some of the external libraries more carefully, because it was doing a good job of automatically shimming fetch. So we’ll need to mock axios, superagent, and openapi-typescript-codegen’s internal fetchers without accidentally mocking internal runtime code and giving it a fake “boost” (axios especially, since you can give it a custom fetcher but I want to avoid doing that because that’s an entirely different internal codepath, one that isn’t common; I’d like to mock in a way that all libraries are called with their defaults if at all possible)
yes, that's the downside unfortunately :(
@drwpow, I'm taking the liberty to moving your comment to an issue so we do not lose the discussion.
Originally posted by @drwpow in https://github.com/openapi-ts/openapi-typescript/issues/1810#issuecomment-2265690346