[BUG] compare.py: no statistics computed for JSON input

google / benchmark

A microbenchmark support library

Apache License 2.0

8.94k stars 1.62k forks source link

[BUG] compare.py: no statistics computed for JSON input #1648

Closed the80srobot closed 1 year ago

the80srobot commented 1 year ago

Describe the bug

The documentation describes the output of compare.py as including things like the p-value and stdev, but the actual output leaves those out.

System

Which OS, compiler, and compiler version are you using:

OS: Linux debian 6.5.0-rc4+ aarch64
Compiler and version: gcc (Debian 12.2.0-14) 12.2.0

To reproduce

Steps to reproduce the behavior:

Have two benchmark runs in JSON
Run compare.py benchmarks FILE1 FILE2

Expected behavior

compare.py should output statistics, or explain why it cannot. The documentation should explain when statistics are available.

Additional context

I'm sure I'm holding it wrong, but the documentation describes very different behavior from what actually happens when running the same commands.

the80srobot commented 1 year ago

OK, so the issue is that compare.py behaves differently when a single file contains multiple runs, as opposed to multiple runs being concatenated together. I was holding it wrong.

I think it would be helpful if the documentation pointed this out, but in the end, it only took me 20 minutes to figure out what to do, so maybe it's not a big deal.

LebedevRI commented 1 year ago

a single file contains multiple runs, as opposed to multiple runs being concatenated together

what do you mean by multiple runs being concatenated together?

the80srobot commented 1 year ago

Sorry, I mean I appended the results of multiple benchmark runs with >>. That was just stupid.

But I think the real issue is that the difference between iterations and repetitions isn't obvious, and I think it's reasonable to expect the tool to compute statistics with enough iterations. It doesn't, and it might be helpful for the documentation page for compare.py to at least mention this.

By the way, I used compare.py for years when I worked at Google, and I still ended up being confused and using it wrong in the open source. It's either really not obvious, or it's too hot in here and my brain's not working. Both are equally possible, TBH.