Closed HFTrader closed 10 months ago
I highly suspect this is already achievable through custom user counters:
state.counters["ns_per_val"] = bm::Counter{state.range(0), bm::Counter::kIsRate | bm::Counter::kInvert};
Hi Roman,
The kIsRate will normalize by cpu_time
only (counter.cc:20):
And?
Oh nv mind I see what you did there. You passed the vector size and that will be divided by the cpu time, then inverted. Let me check? That might be it.
I know you're going to tell me to look into the code but I'm puzzled now. What is the value that gets fed to Finish()
?
This below does not make any sense. Values should be 0.68 ns/sample for 1M and 0.58 ns/sample for 4k.
Right, how about:
state.counters["ns_per_val"] = bm::Counter{state.iterations() * state.range(0), bm::Counter::kIsRate | bm::Counter::kInvert};
aka
state.counters["ns_per_val"] = bm::Counter{state.range(0), bm::Counter::kIsIterationInvariantRate | bm::Counter::kInvert};
?
That did it. Perhaps add that to the documentation? I don't know why it works but it does.
Nice!
I was hoping that https://github.com/google/benchmark/blob/main/docs/user_guide.md#custom-counters was already clearly explainng that, but if you have a suggestion on a particular wording improvements, please feel free to open a pull request...
Is your feature request related to a problem? Please describe.
I have a benchmark where I measure one operation on a vector, say sort a vector of size N. Assume this operations is
O(N)
so I get as a result a list of numbers that vary according to the size of the vector, which is a parameter/argument to the test.A typical test/benchmark would be like the following:
A typical result would be as below.
To interpret these numbers I'm always doing math in my head like
4742341/1048576
=4.74ns/value
or3673/4096
=0.89ns/value
. This allows me to see all the secondary effects (cache and algorithm impact), which I'm after.Describe the solution you'd like
Ideally I would like to see the metric numbers already normalized by a factor provided by me. Currently the user counter interface does not allow for a user-provided value, only cpu time, number of threads and number of iterations. I am not sure exactly what the proper interface would be as I think this would require the bench test to input such value into the counter object.
Describe alternatives you've considered
The current alternative is to compute all required values in my head.
I could also write a python script to parse the output and print the normalized values
I could create a user counter and compute all the values manually, retrieving the actual cpu time from the state object and dividing manually by the factor I want.
Additional context
This is a use case that's so common for me that I wonder if it's also common for other users. Did this come up at any point?
I might also very well be missing an API or usage here. Please let me know :)