oxc-project / backlog

backlog for collborators only
0 stars 0 forks source link

Add wallclock benchmarks #5

Open overlookmotel opened 2 months ago

overlookmotel commented 2 months ago

Problem

CodSpeed is good, but has some anomalies.

In particular:

  1. All system calls (e.g. system allocator) are "free" on Codspeed's measures.
  2. They say it doesn't take into account the branch predictor.

Mispredicted branches can be a significant perf hit, which we're failing to measure. In particular, I have some ideas to replace branching in lexer with straight-line code (https://github.com/oxc-project/oxc/issues/3292). I suspect this could be a significant gain, but it won't register on current CodSpeed benchmarks - so we can't evaluate this at present.

Possible solution

Introduce wallclock benchmarks (not run with Valgrind) in addition to the existing benchmarks.

How?

Can use the same hack I wrote to run NAPI benchmarks as normal wallclock benchmarks and get the results into CodSpeed.

An improvement would be if it's possible to synthesize fake .out files to send to CodSpeed.

Boshen commented 2 months ago

I also want things like cpu cycles, max rss recorded, binary output ... everything we care about.

overlookmotel commented 2 months ago

I also want things like cpu cycles, max rss recorded, binary output ... everything we care about.

Yes! Please see #6.

overlookmotel commented 1 week ago

Boshen pointed out that problem with wallclock benchmarks on CI is that you get a different machine, potentially with a different CPU etc on each benchmark run. So that introduces variance.

Rolldown is working around that by running benchmarks twice each time - once for current commit/PR, and once for base - and then comparing the two. Both run in series on same machine, so that removes the source of variance.

Notes:

  1. Problem is you only get a relative measure each time (new vs old on this machine). So to get this into CodSpeed, would need to fetch benchmark measure that was submitted to CodSpeed, and then adjust up/down by the relative result for run. i.e. new_uploaded_time = previously_uploaded_time * time_for_new_just_measured / time_for_old_just_measured.
  2. Need to build each benchmark twice. But could build both in parallel in separate jobs, and then run them in series in a single job. Or building 2 in parallel on same machine may actually be OK, because the last ~50% of build time for benchmarks is on a single thread anyway.
Boshen commented 1 week ago

Continuous wall clock benchmark isn't feasible, we had this setup before codspeed. But, we can add a conditional ci job trigger to turn off codspeed and measure against main branch.

overlookmotel commented 1 week ago

Roughly how much variance were you seeing in e.g. parser benchmarks prior to CodSpeed?