ltratt / multitime

Time command execution over multiple executions
http://tratt.net/laurie/src/multitime/
MIT License
115 stars 13 forks source link

Should the means reported by multitime be geometric means? #5

Closed snim2 closed 9 years ago

snim2 commented 9 years ago

The means reported by multitime are arithmetic means:

https://github.com/ltratt/multitime/blob/master/format.c#L224-L241

Fleming and Wallace (1986) say that geometric means should be reported.

Is this an improvement worth making? If so, how does it affect other measures which rely on the mean values (such as std. dev. and CIs)?

ltratt commented 9 years ago

Yes, I think you're right: we should be using geometric means.

snim2 commented 9 years ago

OK, one of my colleagues in statistics has given me some pointers on how to do this, so I'll work up a PR.

ltratt commented 9 years ago

Excellent!

snim2 commented 9 years ago

OK, having read more carefully through the notes that Paul produced on geometric means, I notice that the calculations are pretty straight forward. However, the CIs are not symmetric about the mean. The example Paul gave was a sample set with a mean of 2.432 and a CI of (7.846, 16.511). This makes a difference to how multitime might report back its results.

The example in the md-readme branch is:

$ multitime -n 5 -c 99 awk "function fib(n) \
>   { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
===> multitime results
1: awk "function fib(n)   { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
            Mean                Std.Dev.    Min         Median      Max
real        1.860+/-0.0013      0.021       1.837       1.856       1.895
user        1.833+/-0.0005      0.013       1.812       1.836       1.846
sys         0.002+/-0.0000      0.003       0.000       0.000       0.008

How should this look when the geometric mean calculations have been implemented? Would something like this be reasonable:

$ multitime -n 5 -c 99 awk "function fib(n) \
>   { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
===> multitime results
1: awk "function fib(n)   { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
            Mean                        Std.Dev.    Min         Median      Max
real        1.860 (0.0013, 0.0072)      0.021       1.837       1.856       1.895
user        1.833 (0.0005, 0.0007)      0.013       1.812       1.836       1.846
sys         0.002 (0.0000, 0.0001)      0.003       0.000       0.000       0.008
cfbolz commented 9 years ago

Fleming and Wallace (1986) say that geometric means should be reported.

I read that paper differently. it says that geomean should be used to compute the average over several speedups, but here we have times. Ie if you had three benchmarks, and you get speedups of 1.2, 1.5, 0.9 over some baseline you would take the geomean of these numbers to get an "average" speedup. But in the multitime case you have something like 2s, 2.2s, 1.8s, and for that the arithmetic mean is the right one, imo.

ltratt commented 9 years ago

I plead ignorance.