Misleading y-axis - Githubissues

thekvs / cpp-serializers

Benchmark comparing various data serialization libraries (thrift, protobuf etc.) for C++

730 stars 111 forks source link

Misleading y-axis #13

Closed kodonnell closed 5 years ago

kodonnell commented 7 years ago

Quick comment that your images make it appear that e.g. protobuf'd objects are 10x smaller than cap'n proto ones, which is untrue. This is because the y-axis doesn't start at 'zero' (which is considered bad practice for this reason).

Kentzo commented 7 years ago

I guess it should how much % it's better than the worst case somewhere near the graph.

Kentzo commented 7 years ago

Probably a related problem is that color is not preserved between tests.

kodonnell commented 7 years ago

I guess it should how much % it's better than the worst case somewhere near the graph.

This could still have the same problem. The point is you want the height of the bar to represent the size of the value - and hence the axis must start at (in this case) y = 0.

Kentzo commented 7 years ago

With y = 0 the graph will be unnecessarily huge and won't have any visible difference.

Perhaps instead of absolute numbers it should show percentage, where 100 is the worst case. Then it can start from 0 and still look useful.

kodonnell commented 7 years ago

With y = 0 the graph will be unnecessarily huge and won't have any visible difference.

That's the point - if you can't see any visible difference, then it means there's no material difference. However, there will be a visible difference in this case e.g. yas is twice as fast as cereal, so the yas bar will be twice as small/large (whereas now there's more like +-20% difference).

Perhaps instead of absolute numbers it should show percentage

The graph would look exactly the same, just the labels changed. As above, yas would still be twice as small/large as cereal.

Kentzo commented 7 years ago

Hmm, why would it look exactly the same? Percentage would accurately show how much compression of A is better than of B. E.g. instead of with y being 100 at "0", A being 105 and B being 110 it won't look like A is twice better than B. It would show 95% and 100% on 0-100 scale.

kodonnell commented 7 years ago

Say 10 seconds is worst case and hence 100%. That will be a full height bar. Say something takes 5s, or 50% - whether the axis is 0 - 10s or 0 - 100%, it'll still show a half-height bar.

Kentzo commented 7 years ago

In that case half-height bar is what I as a reader would expect and it would show that something that takes 5s is twice better than something else that takes 10s.

kodonnell commented 7 years ago

half-height bar is what I as a reader would expect and it would show that something that takes 5s is twice better than something else that takes 10s

Correct - that's why the y-axis needs to be 'zeroed'.

Kentzo commented 7 years ago

But that's not enough, for graphics to be useful measure point also needs to be changed.

kodonnell commented 7 years ago

I'm not sure what you mean.

nbelakovski commented 6 years ago

Seems like this thread has died, but I came here to say the same thing as @kodonnell, the size graph needs to start at 0. As it is it is misleading. Glancing at it makes it seem that capnproto takes ~10x more space than yas-compact, but going through the numbers it actually is only 33% bigger. Whether you do percent or size it doesn't matter, please pick whichever one you like better, but the y-axis must start at 0. The graph will not look overly huge, it will just show that there's not as much difference between these protocols in terms of size (which is the truth).

nbelakovski commented 6 years ago

Until the graph is fixed, there should be a disclaimer, I added one in #22