haskell / statistics

A fast, high quality library for computing with statistics in Haskell.
http://hackage.haskell.org/package/statistics
BSD 2-Clause "Simplified" License
300 stars 68 forks source link

"1-CDF" test suite failure in statistics-0.15.2.0 #168

Closed peti closed 4 years ago

peti commented 4 years ago

Citing from https://hydra.nixos.org/build/123376583:

      1-CDF is correct:                                                                      FAIL
        *** Failed! Falsified (after 86 tests and 3 shrinks):
        fDistributionReal 75.0 1.0
        56.0
        Use --quickcheck-replay=146826 to reproduce.
nh2 commented 4 years ago

Hmm, when built with stack on tag 0.15.2.0 or current master (the commit directly after that tag, 6aedd2dd7c595b308c4a005fec96029fd6df3dbe) it does not fail:

% .stack-work/dist/x86_64-linux/Cabal-2.2.0.1/build/tests/tests --quickcheck-replay=146826
...
All 383 tests passed (5.29s)
Shimuuar commented 4 years ago

I did reproduce test failure. It takes few tens of thousands of attempts to trigger failure so it. By look of it here problem is in test. Error tolerance is set too low and increasing it a bit fixes all problems.

nh2 commented 4 years ago

It takes few tens of thousands of attempts to trigger failure

@Shimuuar But shouldn't the --quickcheck-replay=146826 seed make it deterministic? That's what that option was made for, determinism.

Shimuuar commented 4 years ago

It didn't. Maybe different versions of some packags are to blame. random-1.2?

nh2 commented 4 years ago

@Shimuuar Even with this workaround, I think we should still figure out where the nondeterminism comes from. Do you have a guess?

lehins commented 4 years ago

@nh2 --quickcheck-replay=146826 guarantees reproduction only for the same version of quickcheck, compiled with all the same dependencies, for the same ghc.

nh2 commented 4 years ago

That makes sense. I think I misunderstood @Shimuuar; when he said

It takes few tens of thousands of attempts to trigger failure

I thought he meant that even with a fixed --quickcheck-replay seed it was nondeterministic. But probably that's not what he meant.

nh2 commented 4 years ago

@Shimuuar This issue is still not fixed though:

I think your change just made it less likely to occur. I can still reproduce it quite reliably even on your latest commit, by increasing the number of test runs:

% .stack-work/dist/x86_64-linux/Cabal-2.2.0.1/build/statistics-tests/statistics-tests  --pattern 'FDistribution.1-CDF is correct' --quickcheck-tests 1000000
statistics
  Tests for all distributions
    Tests for: FDistribution
      1-CDF is correct: FAIL (0.38s)
        *** Failed! Falsifiable (after 63796 tests and 2 shrinks):
        fDistributionReal 60.0 1.0
        1385.0
        err. tolerance = 4.0e-14
        difference     = 4.8405723873656825e-14
        Use --quickcheck-replay=476742 to reproduce.

1 out of 1 tests failed (0.38s)

So I think we should re-open this bug.

Shimuuar commented 4 years ago

Yes. I meant how many iterations QC takes to find failing test case without specifying seed.

Now this is strange! I ran test for 1e7 iterations without hitting any errors.

nh2 commented 4 years ago

Now this is strange! I ran test for 1e7 iterations without hitting any errors.

@Shimuuar Can you repro with the version built by stack test on your latest commit, with the --quickcheck-replay=476742 that my latest counterexample produced?

Shimuuar commented 4 years ago

Yep. That's legit precision loss. Plot is for d = fDistributionReal 75.0 1.0 and 1 - (cumulative d x + complCumulative d x) image

Shimuuar commented 4 years ago

This is good old old precision loss: 1 - (1 - x). CDF for fDistribution n m is computed as I(x=nx/(m+nx); p=n/2, q=m/2). All is fine except incomplete beta implementation uses identity I(x; p, q) = 1 - I(1-x; q, p) to compute values for large x. So in effect we use 1 - nx/(m+nx) which causes precision loss

Cure for that problem s as simple as it's ugly. We can compute 1 - nx/(m+nx) as m/(m+nx) so we should perform that switch manually. It depends on implementation details of incompleteBeta but it works

Shimuuar commented 4 years ago

I think this commit should fix loss of precision.