Open overlookmotel opened 2 months ago
I also want things like cpu cycles, max rss recorded, binary output ... everything we care about.
I also want things like cpu cycles, max rss recorded, binary output ... everything we care about.
Yes! Please see #6.
Boshen pointed out that problem with wallclock benchmarks on CI is that you get a different machine, potentially with a different CPU etc on each benchmark run. So that introduces variance.
Rolldown is working around that by running benchmarks twice each time - once for current commit/PR, and once for base - and then comparing the two. Both run in series on same machine, so that removes the source of variance.
Notes:
new_uploaded_time = previously_uploaded_time * time_for_new_just_measured / time_for_old_just_measured
.Continuous wall clock benchmark isn't feasible, we had this setup before codspeed. But, we can add a conditional ci job trigger to turn off codspeed and measure against main branch.
Roughly how much variance were you seeing in e.g. parser benchmarks prior to CodSpeed?
Problem
CodSpeed is good, but has some anomalies.
In particular:
Mispredicted branches can be a significant perf hit, which we're failing to measure. In particular, I have some ideas to replace branching in lexer with straight-line code (https://github.com/oxc-project/oxc/issues/3292). I suspect this could be a significant gain, but it won't register on current CodSpeed benchmarks - so we can't evaluate this at present.
Possible solution
Introduce wallclock benchmarks (not run with Valgrind) in addition to the existing benchmarks.
How?
Can use the same hack I wrote to run NAPI benchmarks as normal wallclock benchmarks and get the results into CodSpeed.
An improvement would be if it's possible to synthesize fake
.out
files to send to CodSpeed.