Open 3tilley opened 5 months ago
I didn't consider this initially but you're right that there should be a way to focus benchmarks on throughput rather than latency. Right now the output prioritizes minimum time spent. If each benchmark has the same number of internal iterations, or internal iterations get externally normalized, then benchmarks across functions are somewhat comparable.
While individual SIMD benchmarks are valuable, I feel that end-to-end benchmarking of an algorithm that uses high-throughput SIMD techniques would be more valuable/concrete.
I think I understand what you're saying, but just in case I haven't the issue to my eye is not that there is anything wrong with the throughput measurements or reporting, but their inconsistency to the latency. I would say the throughput is displayed correctly, and is very helpful, it's the latency reporting that I'd like to change to account for internal iterations. Maybe I didn't express that very well though!
Comparing to similar tools, pyperf
takes inner_loops
as a argument, and it doesn't look like criterion
supports this use case, it looks similar to the way that divan
does it.
Hello,
I'm really enjoying using
divan
and I think it lives up to its namesake of being easier to create quick benchmarks.One issue I've come up against is how to benchmark functions that perform an operation multiple times. I think there are two items here, the first one being an opportunity to improve some of the docs, which I'm happy to help with. Between
sample_size
,sample_count
,item_count
,iterations
it's not entirely obvious what affects what. I think I can quickly improve that though.The second is I believe a missing feature that can make it look like there is buggy behaviour. I'm using the demo code below, with two functions. One adds a pairs of numbers together
n
times, ones sleeps for 100usn
times. I've added the sleep because the compiler likes to optimise theadd
, and sleep makes the issue very clear.I'd like to be able to indicate to
divan
that the function I'm running already has some iterations baked into it, and to account for that in its reporting. What's confusing (and initially made me think there was a bug) is that if I add anItemsCount
the throughput correctly accounts for this, but the total time doesn't.Demo code:
Output
It's easier to see on sleep, but it's pretty clear that the time jumps by an order of magnitude per line, but throughput stays consistent.
Just focussing on the median, for each line the output gives:
n | total_time | (n / time) | iterations=iterations
I'd like a way to make it output:
n | total_time / n | (n / time) | iterations=iterations * n
Solutions
1. Make
ItemCount
affect the reported time takenI actually think this is the clearest, but it's a breaking change, and it's confusing with the other members of the enum,
BytesCount
andCharCount
which presumably wouldn't affect the reported time2. Add another way of indicating to
bencher
that there are multiple iterationsThis would be fine, apart from there are already lots of very similar concepts used (
samples
,iterations
,items
) which mean different things, and adding another one adds to the mental load. That solution might look something like:I'd be happy to make either change, but I suspect it might be contentious, so I'd love to hear your thoughts! Note that the above functionality code of course be achieved by adding / sleeping once, and having
divan
handle all of the iterations. But there are some realworld functions that are either vectorised, or simply do more than one iteration already.And if what I'm asking is already possible, I'm very happy to add it to the documentation!