Confidence Intervals occasionally wonky

johnmyleswhite / Benchmarks.jl

A new benchmarking library for Julia

Other

45 stars 15 forks source link

Confidence Intervals occasionally wonky #29

Closed staticfloat closed 9 years ago

staticfloat commented 9 years ago

I've noticed that every now and then, the confidence intervals give a strange result. I print the confidence intervals of every test if they're both not null, which I thought would be enough to guard myself against strangeness, but apparently it's not; occasionally I get NaN CIs when the test time is around 4 seconds (and we're still only allotting 10s for the entire benchmark). Note that the tests that run for 6 seconds don't seem to have this problem.

Here's an example log demonstrating what I'm seeing.

mbauman commented 9 years ago

Caught one:

julia> @benchmark _dense2sparsevec(1:10000,100)
================ Benchmark Results ========================
     Time per evaluation: 321.93 μs [-245188.36 ns, 889.05 μs]
Proportion of time in GC: 1.95% [0.00%, 10.20%]
        Memory allocated: 399.09 kb
   Number of allocations: 17 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 0.05 s

The elapsed times distribution is rather skewed:

screen shot 2015-10-07 at 4 08 37 pm

I can send you the JLD or csv of the results object, if you'd like to look at it closer.

johnmyleswhite commented 9 years ago

A CSV file would be great

staticfloat commented 9 years ago

Matt...... how did you plot in your terminal?

mbauman commented 9 years ago

TerminalExtensions.jl + iTerm2 beta

johnmyleswhite commented 9 years ago

Ah, I finally realized that what's been confusing me about this is that the units of measurement are different for the lower and upper bound -- the CI's are symmetric, they just don't look like they are. The only easy fix for this is to cap the confidence intervals at 0 as a lower bound. This kind of mixture model data with intermittent GC isn't really tractable without some heuristic for identifying outliers and then doing a conditional analysis. We could potentially make some progress with winsorization at something like the 90th percentile, but I'd want to make sure we don't cause a decrease in our ability to measure real changes by doing winsorization.

johnmyleswhite commented 9 years ago

Will be closed by #31.