tinylibs / tinybench

🔎 A simple, tiny and lightweight benchmarking library!
MIT License
1.73k stars 36 forks source link

Results are incorrect #2 #42

Closed 3rd closed 1 year ago

3rd commented 1 year ago

Hey, I've been trying tinybench an comparing it with Benchmark.js, and the results are super different. We're talking billions vs millions, maybe tinybench has too much overhead and that's why it fails to measure really fast functions.

I looked at the other issues and I think it's reporting weird values with simple tests as well:

bench
  .add("10", () => {
    let count = 0;
    for (let i = 0; i < 10; i++) {
      if (i % 2 === 0) count++;
    }
  })
  .add("100", () => {
    let count = 0;
    for (let i = 0; i < 100; i++) {
      if (i % 2 === 0) count++;
    }
  })
  .add("1000", () => {
    let count = 0;
    for (let i = 0; i < 1000; i++) {
      if (i % 2 === 0) count++;
    }
  });

┌─────────┬───────────┬─────────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name │   ops/sec   │ Average Time (ns)  │  Margin  │ Samples │
├─────────┼───────────┼─────────────┼────────────────────┼──────────┼─────────┤
│    0    │   '10'    │ '8,323,637' │ 120.13977791588471 │ '±0.68%' │ 4161819 │
│    1    │   '100'   │ '5,379,855' │ 185.87859194452994 │ '±0.36%' │ 2689928 │
│    2    │  '1000'   │ '1,378,380' │ 725.4889985173941  │ '±0.22%' │ 1000000 │
└─────────┴───────────┴─────────────┴────────────────────┴──────────┴─────────┘```

^ These are the results both with custom { time } and with { iterations: 1000000 }.
Am I using it wrong / misreading anything?
Thanks!
Aslemammad commented 1 year ago

Thanks for the issue, Could you let me know how you tested with benchmarkjs?

3rd commented 1 year ago

Hey, made a small repo for comparisons and experiments: https://github.com/3rd/js-benchmark-tool-comparison

The latest results are in latest.txt.

The benchmark runs the same tests with Benchmark.js, tinybench, and mitata, and the results are super different.

The code responsible for calling the engines is in: /src/runner.mjs

cc @Morglod I started looking into this for benchmarks for my own event bus, and found some interesting benchmark result claims from tseep. After running the tests myself I got the advertised result, but didn't believe it, and indeed after more testing it seems that Benchmark.js measures that having an object and doing obj["key"] or obj["key"] = val is slower than emitting an event with tseep.

Benchmark.js is unmaintained and its measurements seem super wrong. I get that it's used because there aren't many alternatives, but if it can't do the one job it's supposed to we shouldn't use it.

Tinybench also seems to measure things incorrectly, I guess it somehow adds a lot of overhead, and that gets measured as well and included in the final result.

Mitata seems to be the only accurate option, or at least the results are consistent with what I'd expect. Of course, I may be wrong.

Thanks a lot for the help, I'm super interested in solving this problem.

Tests

Random addition

const rand = () => Math.floor(Math.random() * 100);
const run = function () {
  let sum = 0;
  sum = rand() + rand();
  return sum;
};

const one = () => {
  run();
};
const ten = () => {
  for (let i = 0; i < 10; i++) {
    run();
  }
};
const hundred = () => {
  for (let i = 0; i < 100; i++) {
    run();
  }
};

image

for 1..x

image

Reading and writing from an object

const obj = {};
obj.__proto__ = null;

const read = () => {
  return obj["foo"];
};

const write = () => {
  obj["foo"] = "baz";
  return obj;
};

image

Event emitters

cc @Morglod Three calls to bus.emit/publish("foo", "bar"), the buses, handlers, and subscriptions being created outside of the benchmark.

image

RegExp#test vs String#indexOf

@Aslemammad I think this one was already discussed in another ticket, and it seemed solved, am I using tinybench wrong and could that be why I'm not getting the right measurements?

image

Morglod commented 1 year ago

@3rd

measures that having an object and doing obj["key"] or obj["key"] = val is slower than emitting an event with tseep

Actually when you emit event with tseep, you call listeners directly.
I mean its almost as fast as do this:

function emitEvent() {
   listener1();
   listener2();
}

emitEvent();

So it is as fast as reading value from object by string key and calling empty function.
(1400m ops on read vs 1300m ops on emit on your screen).

While every other event emitter iterate over array of listeners.


There are a lot of small things on benchmarking js

To be clear, you should run each case in new nodejs process, to be sure VM is fresh and there is no "precompute" handicap for other benchmarks.

When you benchmarking small thing like "read"/"write"/"sum, node's JIT probably will compile it, so starting from some point it may became faster.

Also node's VM may invoke garbage collector during some test and it will effect results really hard. Its second reason to run each case in different process. Coz tinybench here could catch GC calls.

Looking at % difference of each case of each tool, feels like tinybench is a bit heavier per iteration.


What happens every run?

while() {
...
  try {
    taskStart = this.bench.now();
    if (isAsync) {
      await this.fn();
    } else {
      this.fn();
    }
  } catch (e) {
    this.setResult({ error: e });
  }
...
  const taskTime = this.bench.now() - taskStart;
  samples.push(taskTime);
}

1 try catch context 2 bench.now call with typeof globalThis.process?.hrtime === 'function' and bigint (which is slow) and nano / 1e6 3 pushing each number to samples array which may cause GC

Also looking at benchmark.js's source now they have codegeneration that may change numbers a bit coz it forces V8 to recompile code.


So I think next steps should be:

0 yep tseep is that fast 😃 1 run each benchmark in separate processes to get cleaner result +- without GC 2 remove "try catch" block completely or move it one level up 3 do not switch between implementations during running case (this.bench.now) better pick one on init.

3rd commented 1 year ago

@Morglod Thank you so much for the response! Took a look at the task-collection part of tseep, and it's a super cool idea.

After playing around and adjusting the tests to do it like you do in yours with the if (arguments.length > 100) part, the results are consistent with the ones in your README and with tseep is just that (insanely :hot_pepper:) fast :heart:.

Added some simple tests here to make sure there's no overhead: https://github.com/3rd/js-benchmark-tool-comparison/commit/25bcbb64460f71ac924a0c563837b3eaabe81510

This is Benchmark.js after: image

Both tinybench and mitata (cc @evanwashere) report that calling the handler directly is slower than through tseep.emit, which makes no sense to me, but mitata seems to measure the execution time properly, while tinybench is not even close.

mitata: image

tinybench: image

and a small event bus comparison: image

evanwashere commented 1 year ago

Both tinybench and mitata (cc @evanwashere) report that calling the handler directly is slower than through tseep.emit, which makes no sense to me, but mitata seems to measure the execution time properly, while tinybench is not even close.

What you are observing in mitata is JIT bias (function inlining) because of how little overhead it has, if you swap around functions other magically will be faster image

You can currently trick JIT by adding noop function that does nothing image

3rd commented 1 year ago

Both tinybench and mitata (cc @evanwashere) report that calling the handler directly is slower than through tseep.emit, which makes no sense to me, but mitata seems to measure the execution time properly, while tinybench is not even close.

What you are observing in mitata is JIT bias (function inlining) because of how little overhead it has, if you swap around functions other magically will be faster image

You can currently trick JIT by adding noop function that does nothing image

Ah I understand now, thank you so much! I saw that part of the README but it didn't click then. I've already switched to mitata, it seems to be the best there is right now, and I plan on contributing with some beforeEach & afterEach hooks. Thanks to everyone for your work and help!

3rd commented 1 year ago

It's much closer after adding the noop() trick, but for me the outcome is random in sequential runs, I guess there's not much to measure, some other optimizations are going on behind the scene, and of course the environment matters as well.

Overall I'd say it's pretty reliable, thank you!

image

Aslemammad commented 1 year ago

The problem was hrtime's overhead, I made it an option so users can adopt it if they want! Anyways, the results are 2x faster now, but couldn't remove the remaining 40-70ns overhead compared to mitata. If anyone finds another solution, just feel free to give your insights.