jiahao / paper-benchmark

A short conference paper on benchmarking
3 stars 0 forks source link

Initial comments #6

Open johnmyleswhite opened 8 years ago

johnmyleswhite commented 8 years ago

Still working through the paper's details (which I really like), but here are some quick comments.

jrevels commented 8 years ago

Thanks for reading, and for the comments!

Your point about the minimum is well-taken. It's something us here at Julia Central have gone back and forth on quite a bit. I agree that the minimum is a problematic test/summary statistic, but I would argue that is actually the correct estimand for the specific case we use it for.

Recall the motivation driving our estimator choice: we wish to estimate n, the number of benchmark executions per measurement required to overcome timer inaccuracy error. For this purpose, the minimum is a better estimand (and estimator) than the mean/median/etc., because it ensures our choice of n will be high enough to thwart timer error even if our benchmark runs "faster than average" in experiment. If, instead, we chose the mean, our subsequent choice for n might be too low for some benchmark runs.

Once again, I don't think that minimum is particularly suitable for summarization or hypothesis testing, and for those purposes, I think your comment about tractability vs. value rings very true. I'm hoping that my exploration of non-i.i.d. resampling methods will prove fruitful, and enable the use of more reasonable estimators for hypothesis testing/confidence interval calculation on these wacky benchmark samples. Of course, I'm not sure yet whether that will pan out, but there's always hope.