Closed snim2 closed 9 years ago
Yes, I think you're right: we should be using geometric means.
OK, one of my colleagues in statistics has given me some pointers on how to do this, so I'll work up a PR.
Excellent!
OK, having read more carefully through the notes that Paul produced on geometric means, I notice that the calculations are pretty straight forward. However, the CIs are not symmetric about the mean. The example Paul gave was a sample set with a mean of 2.432
and a CI of (7.846, 16.511)
. This makes a difference to how multitime might report back its results.
The example in the md-readme
branch is:
$ multitime -n 5 -c 99 awk "function fib(n) \
> { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
===> multitime results
1: awk "function fib(n) { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
Mean Std.Dev. Min Median Max
real 1.860+/-0.0013 0.021 1.837 1.856 1.895
user 1.833+/-0.0005 0.013 1.812 1.836 1.846
sys 0.002+/-0.0000 0.003 0.000 0.000 0.008
How should this look when the geometric mean calculations have been implemented? Would something like this be reasonable:
$ multitime -n 5 -c 99 awk "function fib(n) \
> { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
===> multitime results
1: awk "function fib(n) { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
Mean Std.Dev. Min Median Max
real 1.860 (0.0013, 0.0072) 0.021 1.837 1.856 1.895
user 1.833 (0.0005, 0.0007) 0.013 1.812 1.836 1.846
sys 0.002 (0.0000, 0.0001) 0.003 0.000 0.000 0.008
Fleming and Wallace (1986) say that geometric means should be reported.
I read that paper differently. it says that geomean should be used to compute the average over several speedups, but here we have times. Ie if you had three benchmarks, and you get speedups of 1.2, 1.5, 0.9 over some baseline you would take the geomean of these numbers to get an "average" speedup. But in the multitime case you have something like 2s, 2.2s, 1.8s, and for that the arithmetic mean is the right one, imo.
I plead ignorance.
The means reported by
multitime
are arithmetic means:https://github.com/ltratt/multitime/blob/master/format.c#L224-L241
Fleming and Wallace (1986) say that geometric means should be reported.
Is this an improvement worth making? If so, how does it affect other measures which rely on the mean values (such as std. dev. and CIs)?