Open luwes opened 4 years ago
I'd also create a single shuffled set used by all libraries in the randomized benchmark to make it easier to compare the number of operations each does on the same data. (Or fix and reset the generator seed.)
you guys may find this part of domvm's tests useful:
https://github.com/domvm/domvm/blob/master/test/src/flat-list-keyed-fuzz.js
it fuzzes a bunch of lists with various amounts of adds, moves & deletes.
after running this benchmark in various machines with different CPUs, I've noticed that consistent results are highly improbable in here, due to the following scenarios:
Accordingly, we should change the way we measure each library in this way:
This means that there's a lot of work to do to split tests, assertions, and warmup per test a part, but that's likely the best way to have meaningful results on both synthetic benchmark and live one.
At the moment results could be skewed quite a bit from run to run.
Ideally each test should run a minimum of 3 times, randomize library order at least, and take some average