Request: exclude tinybench overhead from the benchmark results

yifanwww commented 3 hours ago

Hi thank you for creating such an amazing benchmarking tool! However the benchmarking result is not exactly what I want.

I have read these issues

I think i'm requesting another feature so i write this issue.

For example we write this code to benchmark:

function noop() {}

function fibonacci(n) {
    if (n === 1 || n === 2) return 1;
    let a = 1;
    let b = 1;
    let c = 2;
    for (let i = 4; i <= n; i++) {
        a = b;
        b = c;
        c = a + b;
    }
    return c;
}

const bench = new Bench({ time: 500 });
bench
    .add('noop', () => noop())
    .add('fibonacci 4', () => fibonacci(4))
    .add('fibonacci 20', () => fibonacci(20));
await bench.run();
console.table(bench.table());

and the result is

┌─────────┬────────────────┬──────────────────────┬─────────────────────┬────────────────────────────┬───────────────────────────┬──────────┐
│ (index) │ Task name      │ Latency average (ns) │ Latency median (ns) │ Throughput average (ops/s) │ Throughput median (ops/s) │ Samples  │
├─────────┼────────────────┼──────────────────────┼─────────────────────┼────────────────────────────┼───────────────────────────┼──────────┤
│ 0       │ 'noop'         │ '44.29 ± 0.22%'      │ '0.00'              │ '17078062 ± 0.02%'         │ '22579347'                │ 11289676 │
│ 1       │ 'fibonacci 4'  │ '44.47 ± 0.34%'      │ '0.00'              │ '17019036 ± 0.02%'         │ '22488750'                │ 11244377 │
│ 2       │ 'fibonacci 20' │ '48.05 ± 0.35%'      │ '0.00'              │ '15692678 ± 0.02%'         │ '20812354'                │ 10406179 │
└─────────┴────────────────┴──────────────────────┴─────────────────────┴────────────────────────────┴───────────────────────────┴──────────┘

Refer to this example to reproduce the result.

Hmm I don't think this simple fibonacci algorithm would take that long to run. Or even a noop function takes 44 ns to run. A noop function should take zero time, or less than 1 ns due to the direct function call if it's not inlined.

Let's assume the tinybench overhead is 44.29 ns. By excluding 44.29 ns the benchmarking results will be:

noop: 0 ns
fibonacci 4: 0.18 ns
fibonacci 20: 3.76 ns

I cannot say this results are correct because I don't know if we can just consider the noop benchmark result as tinybench overhead. But at least it shows how we can get close to the correct result.

I tried other benchmarking tools and here're the benchmarking results:

C# BenchmarkDotNet (refer to this example to reproduce the result)
- noop: 0.0002 ns
- fibonacci 4: 0.1536 ns
- fibonacci 20: 8.0300 ns
go built-in benchmark (refer to this example to reproduce the result)
- noop: 0.2455 ns
- fibonacci 4: 0.9764 ns
- fibonacci 20: 8.760 ns
Rust criterion (refer to this example to reproduce the result)
- noop: zero time
- fibonacci 4: 1.7134 ns
- fibonacci 20: 2.4494 ns

Those results are significantly different from the tinybench results.

If we look into the BenchmarkDotNet logs, we will see "OverheadActual", "WorkloadActual", for example:

L286: OverheadActual  15: 53370432 op, 72714400.00 ns, 1.3624 ns/op
L311: WorkloadActual  15: 53370432 op, 500232200.00 ns, 9.3728 ns/op

If we subtract them we can get WorkloadActual - OverheadActual = 8.0104 ns/op, it's pretty close to the average result 8.0300 ns/op.

What BenchmarkDotNet actually does is slightly different from that. You can read How it works. It says BenchmarkDotNet gets the result by calcualting Result = ActualWorkload - <MedianOverhead>.

jerome-benoit commented 2 hours ago

Comparing the timing of an algo implemented in different langages with the same measurement tool just tells you which language runs it faster. Comparing the timing of an algo implemented in different langages with different measurement tools just tell you ... absolutely nothing. The measurement methodology is completely wrong.

And measuring the overhead of timestamping a block execution time in an interpreted language has nothing to do with the measurement of executing noop: in your example the latency median of the noop is zero with a zero median absolute deviation. Furthermore, a benchmarking tool for an interpreted language that is not including the interpreter overhead in its measurement is just meaningless.

The measurement methodology used in BenchmarkDotNet is utterly wrong: https://github.com/tinylibs/tinybench/issues/143#issuecomment-2424232480

yifanwww commented 57 minutes ago

Before we go any further i have a question:

What's the difference between "bench 1" and "bench 2", is "bench 2" correct? or is there a way to benchmark a super fast code by tinybench?

function fn() {
  // a small fn that only runs for a few nanoseconds
}

function bigFn() {
  // a big fn that runs for a few milliseconds
  for (let i = 0; i < 100_000_000; i ++) {
    fn()
  }
}

const bench = new Bench({ time: 500 });
bench
    .add('bench 1', () => fn())
    .add('bench 2', () => bigFn());
await bench.run();

jerome-benoit commented 42 minutes ago

It's two different experiments that have nothing in common. I'm not going to explain again here what I have already explained in the link given. Please read it.

tinylibs / tinybench

Request: exclude tinybench overhead from the benchmark results #189