Open amigalemming opened 10 years ago
I have done a little investigating. This bug happens when the var reports = {{json}};
that is rendered into the HTML ends up containing JavaScript "null" values (rather than proper numbers):
var reports = [{"reportAnalysis":{"anMean":{"estUpperBound":3.8590750, ... null, ...
As the browser loads the page, a JavaScript error happens when it tries to run toFixed(3)
on (the first of) these null values.
Where do these null
s come from? They appear to be coming from the JSON rendering of Statistics.Resampling.Bootstrap.Estimate
values, which have all fields as strict Double
s. From playing around with Aeson, I noticed that it will render Double NaN
and Infinity
values as JavaScript "null":
Prelude Data.Aeson> encode (1/0 :: Double)
"null"
Prelude Data.Aeson> encode (0/0 :: Double)
"null"
Prelude Data.Aeson>
I am seeing "null" values end up in this field (among others):
{Report} -> {reportAnalysis :: SampleAnalysis} -> {anRegress :: [Regression]} !! 0 -> {regRSquare :: Bootstrap.Estimate} -> {estLowerBound :: !Double}
Somehow criterion is calculating this estLowerBound
to be NaN
or Infinity
(At least according to the evidence laid out in this comment; I haven't actually witnessed the actual Haskell value)
This is as far as I have come with my investigation.
More evidence. I've spotted the NaN
s in the command line output (Use your browser to search for "NaN"):
benchmarking fib/1
time 38.35 ns (38.28 ns .. 38.46 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 38.38 ns (38.31 ns .. 38.59 ns)
std dev 372.5 ps (176.5 ps .. 702.6 ps)
benchmarking fib/5
time 899.5 ns (897.6 ns .. 902.0 ns)
1.000 R² (1.000 R² .. 1.000 R²)
mean 900.1 ns (898.5 ns .. 902.0 ns)
std dev 5.975 ns (4.724 ns .. 7.946 ns)
benchmarking fib/9
time 6.145 μs (6.137 μs .. 6.157 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 6.160 μs (6.138 μs .. 6.258 μs)
std dev 135.1 ns (23.39 ns .. 306.6 ns)
variance introduced by outliers: 24% (moderately inflated)
benchmarking fib/11
time 16.11 μs (16.07 μs .. 16.18 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 16.13 μs (16.09 μs .. 16.29 μs)
std dev 242.6 ns (96.99 ns .. 482.7 ns)
variance introduced by outliers: 11% (moderately inflated)
benchmarking fib/13
time 42.28 μs (42.02 μs .. 42.66 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 42.41 μs (42.17 μs .. 42.91 μs)
std dev 1.089 μs (573.8 ns .. 2.028 μs)
variance introduced by outliers: 25% (moderately inflated)
benchmarking fib/15
time 110.7 μs (110.3 μs .. 111.1 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 110.5 μs (110.3 μs .. 110.9 μs)
std dev 903.4 ns (634.4 ns .. 1.270 μs)
benchmarking fib/20
time 1.215 ms (1.210 ms .. 1.223 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.217 ms (1.214 ms .. 1.222 ms)
std dev 13.26 μs (7.718 μs .. 20.12 μs)
benchmarking fib/25
time 13.61 ms (13.53 ms .. 13.69 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 13.55 ms (13.52 ms .. 13.58 ms)
std dev 83.90 μs (58.05 μs .. 108.5 μs)
benchmarking fib/30
time 149.7 ms (149.1 ms .. 150.6 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 149.9 ms (149.7 ms .. 150.2 ms)
std dev 337.9 μs (213.8 μs .. 503.5 μs)
variance introduced by outliers: 12% (moderately inflated)
benchmarking fib/31
time 242.2 ms (241.7 ms .. 242.7 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 242.4 ms (242.2 ms .. 242.6 ms)
std dev 274.6 μs (118.4 μs .. 342.2 μs)
variance introduced by outliers: 16% (moderately inflated)
benchmarking fib/32
time 392.0 ms (388.7 ms .. 394.7 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 392.3 ms (392.1 ms .. 392.6 ms)
std dev 396.8 μs (0.0 s .. 419.0 μs)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fib/33
time 635.0 ms (632.2 ms .. 639.1 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 634.7 ms (633.6 ms .. 635.5 ms)
std dev 1.098 ms (0.0 s .. 1.268 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fib/34
time 1.028 s (1.022 s .. NaN s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.028 s (1.027 s .. 1.029 s)
std dev 1.023 ms (0.0 s .. 1.083 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fib/35
time 1.665 s (1.651 s .. 1.675 s)
1.000 R² (NaN R² .. 1.000 R²)
mean 1.665 s (1.664 s .. 1.667 s)
std dev 2.290 ms (0.0 s .. 2.466 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fib/36
time 2.697 s (2.618 s .. 2.770 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.701 s (2.692 s .. 2.710 s)
std dev 14.82 ms (0.0 s .. 15.10 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fib/37
time 4.426 s (4.355 s .. 4.581 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 4.375 s (4.354 s .. 4.395 s)
std dev 33.17 ms (0.0 s .. 33.97 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking fib/38
time 7.314 s (7.022 s .. 7.496 s)
1.000 R² (0.999 R² .. 1.000 R²)
mean 7.340 s (7.297 s .. 7.370 s)
std dev 46.02 ms (0.0 s .. 52.85 ms)
variance introduced by outliers: 19% (moderately inflated)
I currently am unable to dig further into this bug.
Hopefully the info above can point someone who is more familiar with the criterion codebase in the right direction.
@RyanGlScott, we had a problem with this as well, but now I can't remember what it was. Do you?
I think the issue might have been that the FromJSON Measured
instance was doing something dodgy with null
values behind the scenes, which we fixed with this commit.
Therefore, I'm tempted to claim this bug as fixed.
I'm also getting this with criterion-1.1.1.0.
Sadly, 1.1.1.0 doesn't include the linked commit. It sounds like it is time to do a 1.1.1.1 release for bugfixes. @RyanGlScott and @bos - any opinions about where to cut that?
There's currently a "1.2" branch corresponding to the statistics 0.14 release. The master
branch also already has a number of commits relative to 1.1.1.0, which you can see here.
Additions like --json
actually went in before the bugfix Ryan linked. We could cut a 1.1.1.1 that backports the fix to this issue. Or we could plow ahead and release a 1.1.3.1, which is the version master is already tagged with.
I'm not completely sure if that should be 1.1.X or 1.2, however. We also changed the default report format to json, so this is a breaking change that should require a major version bump. If we do the major version bump and push out 1.2 now, then the statistics-0.14 version can slide down to 1.3.
Could you, @tswilkinson and @bitc, see if you can reproduce the error on master?
Ack, I spoke too soon. I can still reproduce this error with criterion-1.1.4.0
.
I did some digging into this recently, and I think I've narrowed the issue down to a function in statistics
. Notice these lines in criterion
:
(coeffs,r2) <- liftIO $
bootstrapRegress gen resamples confInterval olsRegress ps r
I've managed to get coeffs
and r2
values that contain NaN
(non-determinsitically, though, since it depends on a PRNG gen
). Here is how I reproduced it in GHCi (using actual values for ps
and r
that I recorded during a criterion
session in which this bug happened):
$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
λ> :m + System.Random.MWC Statistics.Regression
λ> gen <- createSystemRandom
λ> :set -XOverloadedLists
λ> bootstrapRegress gen 1000 0.95 olsRegress [[1.0,2.0,3.0,4.0]] [0.4834418371319771,0.9643802028149366,1.4471413176506758,1.9452479053288698]
([Estimate {estPoint = 0.48681793194264117, estLowerBound = 0.4809383656829592, estUpperBound = 0.4981065876781946, estConfidenceLevel = 0.95},Estimate {estPoint = -6.992014124988053e-3, estLowerBound = NaN, estUpperBound = 2.503471449018241e-3, estConfidenceLevel = 0.95}],Estimate {estPoint = 0.999930103564569, estLowerBound = 0.9998776351901867, estUpperBound = 1.0, estConfidenceLevel = 0.95})
Notice the NaN
value in the first field of the pair that bootstrapRegress
returns. (You may have to re-run that last line several times before the PRNG gives you a NaN
value.)
@Shimuuar, do have any idea why bootstrapRegress
might be giving NaN
values? FWIW, this is with statistics-0.13.3.0
.
Here's what I believe to be a more deterministic way of reproducing the bug, using initialize
instead of createSystemRandom
(the latter of which is what criterion
actually uses internally):
$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
λ> :m + System.Random.MWC Statistics.Regression Data.Word
λ> import qualified Data.Vector.Unboxed as U
λ> :set -XOverloadedLists
λ> gen <- initialize ([1..1000] :: U.Vector Word32)
λ> bootstrapRegress gen 1000 0.95 olsRegress [[1.0,2.0,3.0,4.0]] [0.4834418371319771,0.9643802028149366,1.4471413176506758,1.9452479053288698]
([Estimate {estPoint = 0.48681793194264117, estLowerBound = 0.4809383656829594, estUpperBound = 0.4911313727498061, estConfidenceLevel = 0.95},Estimate {estPoint = -6.992014124988053e-3, estLowerBound = -4.717844538390807e-2, estUpperBound = 2.5034714490178326e-3, estConfidenceLevel = 0.95}],Estimate {estPoint = 0.999930103564569, estLowerBound = 0.9998776351901867, estUpperBound = NaN, estConfidenceLevel = 0.95})
This time, the NaN
in the estUpperBound
of r2
(the second field of the pair).
I'll look into it.
My guess is resample sometimes gives values where every point have same x
and linear fit returns NaN. After that NaNs propagate
I was right about this one. I think problem appear in long running computations because they collect less samples and resample where all values are same is more likely.
On Thu, 17 Nov 2016, Aleksey Khudyakov wrote:
I was right about this one. I think problem appear in long running computations because they collect less samples and resample where all values are same is more likely.
That matches my experience.
Thanks for looking into this, @Shimuuar!
(For reference, the statistics
issue is being tracked in https://github.com/bos/statistics/issues/111.)
I think proper solution will require changes to linear regression API and consequently bootstrapRegress. I think simply returning NaN is bad idea as this issue shown. Lack of unique solution should be reported in more obvious way like returning Nothing.
On Thu, 17 Nov 2016, Aleksey Khudyakov wrote:
I think proper solution will require changes to linear regression API and consequently bootstrapRegress. I think simply returning NaN is bad idea as this issue shown. Lack of unique solution should be reported in more obvious way like returning Nothing.
An alternative way would be to go the way of LAPACK or pseudo inverses, that is, add another criterion: If the solution of linear regression is not unique, the choose among all solutions the one with minimal squared value. If you only have one data point linear regression would result in a constant function with this additional condition.
If a computation runs quite long, say, longer than 0.5s, then I get a result like:
in the HTML report and then all following regression results are "xxx", too. This also implies that the overview diagram is not generated.