The experiment was in one sense a success - it reduced variance in benchmarks almost to zero. That does suggest that indeterminism in system allocator is the largest cause of benchmark variance. In fact, I wonder why it didn't reduce variance to absolutely zero - what else can be causing variance?
But there is a problem: The simple allocator I used talc is too fast. This makes allocation unrealistically cheap, which in turn makes our benchmarks unrealistic.
Let's say we introduce a change that removes a bunch of allocations, but requires some extra work to do that (caching structs, bookkeeping etc). That is very likely to be a performance gain in real world, but if the allocator we use for benchmarks makes allocation unrealistically cheap (as this one does), benchmarks will lie to us and tell us it's a perf regression. For example, with this allocator, benchmarks probably would have told us https://github.com/oxc-project/oxc/pull/4213 was a perf regression, whereas in fact it gave +5% speed up.
NB: Allocations are a small part of the code overall. So if we're seeing 10% perf boost on some benchmarks from replacing the allocator, probably that means this new allocator is ~double the speed of the system one. That's a very big discrepancy.
What we need is an allocator which is as close as possible to real world allocators (e.g. libc's, or jemallocator) but does not include any random elements.
We might have more luck with https://crates.io/crates/dlmalloc which it sounds like is a closer analogue to the default system allocator (from libc), and so may reduce this discrepancy. But it doesn't have as easy an API to work with.
I also asked for help on CodSpeed Discord but it seems they're not sure how to solve this either.
https://github.com/oxc-project/oxc/pull/4483 was an experiment in reducing variance in our benchmarks by using an allocator which has deterministic behavior.
The experiment was in one sense a success - it reduced variance in benchmarks almost to zero. That does suggest that indeterminism in system allocator is the largest cause of benchmark variance. In fact, I wonder why it didn't reduce variance to absolutely zero - what else can be causing variance?
But there is a problem: The simple allocator I used talc is too fast. This makes allocation unrealistically cheap, which in turn makes our benchmarks unrealistic.
Let's say we introduce a change that removes a bunch of allocations, but requires some extra work to do that (caching structs, bookkeeping etc). That is very likely to be a performance gain in real world, but if the allocator we use for benchmarks makes allocation unrealistically cheap (as this one does), benchmarks will lie to us and tell us it's a perf regression. For example, with this allocator, benchmarks probably would have told us https://github.com/oxc-project/oxc/pull/4213 was a perf regression, whereas in fact it gave +5% speed up.
NB: Allocations are a small part of the code overall. So if we're seeing 10% perf boost on some benchmarks from replacing the allocator, probably that means this new allocator is ~double the speed of the system one. That's a very big discrepancy.
What we need is an allocator which is as close as possible to real world allocators (e.g. libc's, or jemallocator) but does not include any random elements.
We might have more luck with https://crates.io/crates/dlmalloc which it sounds like is a closer analogue to the default system allocator (from
libc
), and so may reduce this discrepancy. But it doesn't have as easy an API to work with.I also asked for help on CodSpeed Discord but it seems they're not sure how to solve this either.
Try to figure this out when have more time.