Benchmark test might be unreliable

mozilla / pluotsorbet

[ARCHIVED] PluotSorbet is a J2ME-compatible virtual machine written in JavaScript.

GNU General Public License v2.0

238 stars 46 forks source link

Benchmark test might be unreliable #1294

Open marco-c opened 9 years ago

marco-c commented 9 years ago

The benchmark samples might not be normally distributed, which makes the Student's t-test results unreliable.

This is a Q-Q plot of one run of the benchmark (30 rounds): qq plot Obviously the data is not normally distributed in this case.

brendandahl commented 9 years ago

screenshot 2015-03-23 10 34 39

100round run^

Without the tails it's not far off from a normal distribution. There are a few statistician people in metrics near my desk. If I get some time a little later I'll discuss with them.

marco-c commented 9 years ago

Yeah, the more rounds the better it is for the central limit theorem (the magic number is often 30) Maybe we should draw the quantile-quantile plot after running the benchmarks, so that we avoid wrong conclusions.

marco-c commented 9 years ago

(I don't have enough knowledge about statistics to tell if the tails are a problem)

brendandahl commented 9 years ago

norm

Memory spikes seem to correlate to startup time spikes.