Closed the80srobot closed 1 year ago
OK, so the issue is that compare.py behaves differently when a single file contains multiple runs, as opposed to multiple runs being concatenated together. I was holding it wrong.
I think it would be helpful if the documentation pointed this out, but in the end, it only took me 20 minutes to figure out what to do, so maybe it's not a big deal.
a single file contains multiple runs, as opposed to multiple runs being concatenated together
what do you mean by multiple runs being concatenated together
?
Sorry, I mean I appended the results of multiple benchmark runs with >>. That was just stupid.
But I think the real issue is that the difference between iterations and repetitions isn't obvious, and I think it's reasonable to expect the tool to compute statistics with enough iterations. It doesn't, and it might be helpful for the documentation page for compare.py to at least mention this.
By the way, I used compare.py for years when I worked at Google, and I still ended up being confused and using it wrong in the open source. It's either really not obvious, or it's too hot in here and my brain's not working. Both are equally possible, TBH.
Describe the bug
The documentation describes the output of compare.py as including things like the p-value and stdev, but the actual output leaves those out.
System
Which OS, compiler, and compiler version are you using:
To reproduce
Steps to reproduce the behavior:
Expected behavior
compare.py should output statistics, or explain why it cannot. The documentation should explain when statistics are available.
Additional context
I'm sure I'm holding it wrong, but the documentation describes very different behavior from what actually happens when running the same commands.