Open N9199 opened 1 year ago
No API for that because measurement quality could be very low if the timer has to be restarted often. Instead, please do 2 measurements: first measure runtime of just the setup (e.g. creating & shuffling), then benchmark everything, and subtract the first.
@martinus
Couldn't this be easily solved by having all setup done first (for x amount of iterations, and from the docs it seems like nanobench
calculates up-front how many iterations to do). Then it would just be a matter of picking the next "batch" from a contiguous list:
std::vector<int> generate_shuffled_vector();
ankerl::nanobench::BenchBatch<std::vector<int>>().
generateBatch([] -> std::vector<int>{
// returned batch can just be stored
// in whatever structure you see fit (eg. in this
// case perhaps `std::vector<std::vector<int>>`)
return generate_shuffled_vector();
}).run("amazing sorter", [](std::vector<int>& batch){
// `batch` is just the next entry, and next, and next...
my_amazing_sorter(batch);
}
);
As long as whatever generated batch is either copy- or moveable, all should be fine...
Obviously there is some overhead here, but it should be fairly stable (depending on what your "batch" is).
Also, I guess most scenarios requiring this are probably non-trivial, so I bet the batches[next++]
won't cause totally useless results... But I'm all guessing here. Just seems that being able to do something as simple as a sorting algorithm would be expected.
Also I want to get away from google-bench, but need this. 🙃
To add to what @helmesjo said, I looked into how Criterion implemented it, and I saw the current implementation of run
, it seems that they do essentially the same, but they add a preparation step where they construct the inputs. Now maybe I'm missing something, but couldn't the same be done in nanobench with the input construction happening before pc.beginMeasure();
?
Ive reopened the issues and linked all similar issues I previously closed to this one. I guess public interest is too large for me to brush it away...
What I absolutely not want is start/stop of the timer in the tight measurement loop (which seems to be what Criterion is doing). That would make it impossible to benchmark something that's fast, especially because I also need to start/stop the linux performance counters. I need contemplate more on how to best do it, but currently don't have too much free time on my hand for this.
I would love to have this feature as well. Maybe it would be the easiest to have a different entry point for benchmarks that require setup (and teardown!) to be excluded from the measurements, say FancyBench()::run()
, as -- I assume -- they are not what you consider "fast". :wink:
I've landed here because I think I need this feature to benchmark 'something fast', Protobuf encode+decode. Basically, I have an encode+decode step which benchmarks out as 100ns. Then if I break this apart into two separate functions, one bench for encode and one for decode, I get 30ns for encode and 1ns for decode. These results don't really make sense naively, I'd expect X+Y=Z, 30+something=100. I suspect either something more is being optimized out (hard to tell because I'm no good at disassembly 😃 ) or this might be an artifact of running the things in two separate loops so that the branching/caching get along better. I just don't expect such a huge true difference because my message I'm encoding/decoding is less than a cache-line in size.
But so far the only examples I've seen of state larger than a word carried across runs and ensuring things won't be optimized out is the gtest approach of redoing the setup within the hot loop, starting and stopping a timer for the part you're interested in. I understand this might add some overhead, but I'm not that concerned with that size of overhead, especially because when my code is embedded in another process, I will be far less certain exactly what the cost of each line of code is, due to different cache conditions/inlinability vs the microbench.
I understand why you don't want to pause the timer between epoch iterations and I agree with you about that, but at least we really need fixtures between epochs.
It's probably not ideal, but here's an example of a possible implementation : https://github.com/RT2Code/nanobench/commit/c79a0814c1fb92a1fc7b5d276e3fba4ec9c9355b
And my use case :
Signal<void(int)> signal;
bench.epochIterations(1000).run("Connect", [&]()
{
signal.connect(slot);
},
[] {}, // Setup function (doing nothing here)
[&] // Teardown function
{
signal.reset();
});
Connect is a function with side effects since it appends the slot to a vector, so using a fixed iteration number is better here in order to have a consistent result between each run. The teardown function serves the purpose of resetting the signal between each epoch. Without this, subsequent epochs would cause the vector to expand, leading to severe variations in timing measurements.
Instead, please do 2 measurements: first measure runtime of just the setup (e.g. creating & shuffling), then benchmark everything, and subtract the first.
Couldn't we just provide an API to do exactly this? You provide code for a "setup", then you provide code for a setup + operation, and then you run the tight loops on both, and do the subtraction?
ankerl::nanobench::Bench().batch(1000).runWithSetup("Foo bar",
[&]() {
// setup
Foo foo = expensiveSetup();
},
[&](auto & foo){
for(size_t i = 0; i < 1000; ++i) {
foo.bar();
}
...
});
Something like this?
I'm looking for a way to create input for each batch. As an example, if I wanted to benchmark something like sorting algorithms I need to have a new unsorted vector for each batch, either via copying one already created or by shuffling the one already used. In both cases the setup isn't negligible, so I'm wondering if nanobench has an API which helps with that? I was thinking of something like Google Benchmarks
PauseTiming
andResumeTiming
, or something like Rust's Criterioniter_batched
anditer_batched_ref
?