Closed polarathene closed 7 months ago
I just noticed that compared to the strings example, I was using bench_values()
instead of bench_refs()
. The latter avoids the overhead.
I assume that you should still be able to bench a method by value without the timing overhead from the input.
I was using
bench_values()
instead ofbench_refs()
. The latter avoids the overhead.
It appears what's happening is you're dropping Vec<String>
in every run of your benchmarked function. The timing taking longer is expected in this case. You would need to forward it back to Divan by returning it. Or use bench_refs()
, which does semantically the same (inout argument).
You would need to forward it back to Divan by returning it.
That was it, cheers!
Is that related to the Drop
example in docs? It only mentioned a memory cost, with a later example of not needing to return a value.
The benched function would take a hashmap as input and return a string. That value was also implicitly returned in the bench_values()
closure, but I didn't expect an unused input to introduce timing overhead when not used.
// Slow without return
#[divan::bench]
fn with_dynamic_input(b: Bencher) {
b.with_inputs(|| {
mock_data()
}).bench_values(|_| {
black_box(42);
});
}
// Slow with return
#[divan::bench]
fn with_dynamic_input(b: Bencher) {
b.with_inputs(|| {
mock_data()
}).bench_values(|_| {
// without black_box, also still slow
black_box(42)
});
}
// Fast, no notable timing overhead issues
#[divan::bench]
fn with_dynamic_input(b: Bencher) {
b.with_inputs(|| {
mock_data()
}).bench_values(|x| {
x
});
}
Might be worth clarifying that in with_inputs()
to mention timing will be affected if the input does not defer drop?
Or I guess this issue is sufficient itself for troubleshooting the cause 👍
but I didn't expect an unused input to introduce timing overhead when not used.
Rust was doing what you were telling it to. You just weren't telling it what you wanted it to do 😉. I was hoping the name "values" would help make the move semantics obvious but it seems to still be subtle.
@polarathene Please submit a PR to improve the docs here! I'm sure you would have good phrasing for it.
Rust was doing what you were telling it to. You just weren't telling it what you wanted it to do 😉. I was hoping the name "values" would help make the move semantics obvious but it seems to still be subtle.
I thought from the black_box()
docs on compiler optimizations, an unused parameter wouldn't have that sort of impact.
I'm familiar with value vs ref and thought I understood them well enough 😅 Just to be clear my confusion with this issue was:
with_inputs()
.divan
to report time unrelated to what was going outside of the benched function, especially when the input was changed to _
and not used 🤷♂️ This was my first time trying benching with Rust, so it was perhaps the wrong expectation.#[divan::bench]
fn with_dynamic_input(b: Bencher) {
b.with_inputs(|| {
mock_data()
}).bench_values(|x| {
// If operation doesn't use input, or input is not returned as it was
// Timing Overhead, regardless of operation
do_stuff(x)
// <1ns if instead returning input as-is
x
});
}
// <1ns, no overhead
fn do_stuff(s: Vec<String>) -> Vec<String> {
s
}
// 700ns, returning modified input
fn do_stuff(mut s: Vec<String>) -> Vec<String> {
s.truncate(1);
s
}
// 1.4µs, returning different type from input
fn do_stuff(mut s: Vec<String>) -> String {
s.concat()
}
// 700ns, returning a different type, ignoring input
fn do_stuff(s: Vec<String>) -> u8 {
42
}
// 700ns, not returning anything
fn do_stuff(s: Vec<String>) {
42;
}
<1ns
my understanding is it's been optimized out.700ns
is unrelated to the function, and more to do with the input itself adding overhead? 🤷♂️ @polarathene Please submit a PR to improve the docs here! I'm sure you would have good phrasing for it.
I'm happy to submit a PR if another user chimes in with the same confusion, except given the above I still don't quite follow why divan
should be including that 700ns
🤔
In my last comment, I thought I just had to consume the input and/or return something, thinking the timing was due to my function logic alone after all, but it seems there is still overhead from the input size regardless of what happens in do_stuff()
(at least when passing by value).
bench_values()
, yet when I removed it there was still notable overhead.The docs for
with_inputs()
imply this should not contribute to timing?:I've tried to look through the current docs and announcement blogpost, but it's not clear to me why this is happening?
If I instead swap the mock data generator function for one that just sleeps instead of allocates:
It will take longer to complete, but does not affect the timing:
It only affects timing when I return the mocked data and the impact scales with the size of the input. I assume it's related to the example description of
Drop
behaviour here, but I'm not sure how to exclude that overhead.