nvzqz / divan

Fast and simple benchmarking for Rust projects
https://nikolaivazquez.com/blog/divan/
Apache License 2.0
849 stars 24 forks source link

How do I initialise a structure so it's shared between threads of the same bench iteration? #51

Open anko opened 2 months ago

anko commented 2 months ago

I'm trying to benchmark a concurrent data structure, and I want to benchmark its read/write behaviour under thread contention. However, unlike all of the threaded examples in documentation, this structure's performance characteristics change as it is modified: internal parts of it are consumed or rearranged by different threads, so it needs to be constructed again for each run of the benchmark.

This means:

Either the structure is constructed once, then shared among all iterations (the first option), or constructed separately for each thread, and never shared (the second option). I need a way to make it constructed once per benchmark run, and shared only among threads that are part of the same benchmark run. Do I correctly understand that this is currently not possible using the threads option?


My current workaround is to start a const number of threads myself inside the with_inputs closure and have them wait at a std::sync::Barrier, then as part of the bench_local_values closure, release the Barrier and join the threads to time them:

#[divan::bench(consts = [1, 2, 4, 8, 16])]
fn benchmark_function<const THREADS: usize>(bencher: divan::Bencher) {
    use std::sync::{Arc, Barrier};
    bencher
        .with_inputs(|| -> (Vec<std::thread::JoinHandle<_>>, _) {
            let x: MyStruct = Arc::new(create_structure());
            let barrier = Arc::new(Barrier::new(THREADS + 1));
            let threads = (0..THREADS).map(|_| {
                let x = x.clone();
                let barrier = barrier.clone();
                std::thread::spawn(move || {
                    barrier.wait();
                    x.consume_contents();
                })
            }).collect();
            (threads, barrier)
        })
        .bench_local_values(|(threads, barrier)| {
            barrier.wait();
            for t in threads {
                t.join().unwrap();
            }
        });
}

This works, but there's a lot of code duplicating what I imagine Divan would do internally to implement the threads option.

I also see worse performance when benchmarking with 1 thread using this method than I do from an otherwise-identical benchmark with #[divan::bench(threads = [1])]. Probably because Divan doesn't use a Barrier when single-threaded. Which is smart, and another reason why I feel like this could be handled.

Am I missing a better existing way to do this?