Weird different benchmark results for code that should be fairly identical

vlovich commented 10 months ago

I have some benchmarks that looks like this:

use std::mem::MaybeUninit;

fn main() {
    let _ = memcache::CRATE_USED;
    divan::main();
}

fn weird_results_impl(b: divan::Bencher, size: usize) {
    const NUM_ITEMS: usize = 100_000;
    const CAPACITY: usize = NUM_ITEMS;
    let cache = vec![Default::default(); CAPACITY];
    let values = (0..NUM_ITEMS)
        .map(|_| vec![std::mem::MaybeUninit::<u8>::uninit(); size].into_boxed_slice())
        .collect::<Vec<_>>();
    b.counter(divan::counter::ItemsCount::new(NUM_ITEMS))
        .with_inputs(|| {
            (
                cache.clone(),
                values
                    .iter()
                    .enumerate()
                    .map(|(idx, v)| (idx % CAPACITY, v.clone()))
                    .collect::<Vec<_>>(),
            )
        })
        .bench_local_refs(|(cache, refs)| {
            for (entry, mem) in refs {
                cache[*entry] = std::mem::take(mem);
            }
        });
}

#[divan::bench]
fn weird_results_4kib(b: divan::Bencher) {
    weird_results_impl(b, 4 * 1024);
}

#[divan::bench]
fn weird_results_10b(b: divan::Bencher) {
    weird_results_impl(b, 10);
}

There's a fairly large discrepancy between the two

my-crate               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ weird_results_4kib  165.4 µs      │ 211.1 µs      │ 173.5 µs      │ 174.8 µs      │ 100     │ 100
│                      604.2 Mitem/s │ 473.5 Mitem/s │ 576.2 Mitem/s │ 571.8 Mitem/s │         │
╰─ weird_results_10b   80.53 µs      │ 110.5 µs      │ 83.22 µs      │ 84.07 µs      │ 100     │ 100
                       1.241 Gitem/s │ 904.2 Mitem/s │ 1.201 Gitem/s │ 1.189 Gitem/s │         │

This was run with mimalloc set as the allocator. AFAICT I'm not dropping any memory within the benchmark loop and the body of the loop shouldn't be doing anything more than shuffling some pointers around (i.e. should be the same amount of shuffling between the two runs I think). Is there something wrong with my benchmark or a bug in divan?

nvzqz commented 10 months ago

I'm not able to reproduce your results when I don't add memcache or mimalloc. I can try again later with those added.

Also, you can use NUM_ITEMS directly since usize implements IntoCounter\:

- b.counter(divan::counter::ItemsCount::new(NUM_ITEMS))
+ b.counter(NUM_ITEMS)

vlovich commented 10 months ago

memcache is the name of my own crate - can be ignored. Strange that you're not seeing it. mimalloc might be needed to make things more obvious. You must have a faster machine for this benchmark since my 13900 doesn't get that fast with the standard allocator.

OliverKillane commented 4 months ago

My suspicion is that:

The benchmarks are varied by the size parameter, which in turn affects the size of the values slices, which makes the clone inside .with_inputs(..) more expensive.

The with_inputs time seems to be caught in the main benchmark time, hence the large difference between the benchmarks.

See #55

nvzqz / divan

Weird different benchmark results for code that should be fairly identical #31