tweag / monad-bayes

A library for probabilistic programming in Haskell.
MIT License
404 stars 62 forks source link

Test "Pipes: HMM pipe model is equivalent to standard model" is flaky #283

Closed turion closed 1 year ago

turion commented 1 year ago

Failures like these occur often:

Failures:

  test/Spec.hs:141:5: 
  1) Pipes: HMM pipe model is equivalent to standard model
       Falsified (after 32 tests and 4 shrinks):
         [0.0,0.0,-1.6666666666666667,-1.6666666666666667]

  To rerun use: --match "/Pipes: HMM/pipe model is equivalent to standard model/"

Randomized with seed 309526202

This might even lead to monad-bayes getting marked broken in nixpkgs, see https://github.com/NixOS/nixpkgs/pull/236431

turion commented 1 year ago

Also see #260 , we should move pipes to a separate package. Then only that separate package would fail.

turion commented 1 year ago

Might be caused by https://github.com/nick8325/quickcheck/issues/359

turion commented 1 year ago

Or similar to https://github.com/haskell/vector/issues/460 it might be that the new generator produces more values for which we experience overflows.

turion commented 1 year ago

Easily reproduceable wtih this diff:

diff --git a/monad-bayes.cabal b/monad-bayes.cabal
index e74b1fe..94b3fed 100644
--- a/monad-bayes.cabal
+++ b/monad-bayes.cabal
@@ -189,7 +189,7 @@ test-suite monad-bayes-test
     , pipes
     , pretty-simple
     , profunctors
-    , QuickCheck
+    , QuickCheck == 2.14.3
     , random
     , statistics
     , text

And a file cabal.project.local containing:

tests: True
turion commented 1 year ago

The issue is in the hard sorting function in the compact function. I added the following diff to inspect the generated values:

diff --git a/test/TestPipes.hs b/test/TestPipes.hs
index 4efe68a..fe51a12 100644
--- a/test/TestPipes.hs
+++ b/test/TestPipes.hs
@@ -8,12 +8,14 @@ import Control.Monad.Bayes.Enumerator (enumerator)
 import Data.AEq (AEq ((~==)))
 import HMM (hmm, hmmPosterior)
 import Pipes.Prelude (toListM)
+import Test.QuickCheck (counterexample, Property, property)

 urns :: Int -> Bool
 urns n = enumerator (urn n) ~== enumerator (urnP n)

-hmms :: [Double] -> Bool
+hmms :: [Double] -> Property
 hmms observations =
   let hmmWithoutPipe = hmm observations
       hmmWithPipe = reverse . init <$> toListM (hmmPosterior observations)
-   in enumerator hmmWithPipe ~== enumerator hmmWithoutPipe
+      msg = unlines $ show <$> [("with   ", enumerator hmmWithPipe), ("without", enumerator hmmWithoutPipe)]
+  in counterexample msg $ property $ (fst <$> enumerator hmmWithPipe) ~== (fst <$> enumerator hmmWithoutPipe)

This gave e.g.

  1) Pipes: HMM pipe model is equivalent to standard model
       Falsified (after 9 tests and 2 shrinks):
         [0.0,0.0,-0.5]
         ("with   ",[([1,2,0],0.1546027526337784),([2,0,1],7.073671679025113e-2),([1,1,2],6.82502090909774e-2),([1,2,1],6.184110105351136e-2),([1,1,1],5.85001792208378e-2),([2,2,0],5.305253759268838e-2),([2,0,2],5.305253759268837e-2),([1,2,2],4.638082579013355e-2),([1,1,0],3.900011948055852e-2),([0,2,0],3.8836185197532215e-2),([2,1,2],3.643161738034025e-2),([2,0,0],3.53683583951256e-2),([1,0,1],3.533777203057793e-2),([2,1,1],3.122710061172023e-2),([0,1,2],2.6669130334371047e-2),([1,0,2],2.6503329022933453e-2),([0,1,1],2.2859254572318057e-2),([2,2,1],2.1221015037075363e-2),([2,1,0],2.0818067074480157e-2),([1,0,0],1.7668886015288984e-2),([2,2,2],1.591576127780652e-2),([0,2,1],1.5534474079012886e-2),([0,1,0],1.5239503048212037e-2),([0,2,2],1.1650855559259665e-2),([0,0,1],1.0356316052675273e-2),([0,0,2],7.767237039506448e-3),([0,0,0],5.1781580263376365e-3)])
         ("without",[([1,2,0],0.15460275263377876),([2,0,1],7.073671679025137e-2),([1,1,2],6.82502090909775e-2),([1,2,1],6.184110105351154e-2),([1,1,1],5.850017922083786e-2),([2,0,2],5.3052537592688506e-2),([2,2,0],5.3052537592688506e-2),([1,2,2],4.6380825790133653e-2),([1,1,0],3.900011948055858e-2),([0,2,0],3.883618519753228e-2),([2,1,2],3.643161738034034e-2),([2,0,0],3.536835839512568e-2),([1,0,1],3.533777203057802e-2),([2,1,1],3.122710061172028e-2),([0,1,2],2.666913033437113e-2),([1,0,2],2.650332902293353e-2),([0,1,1],2.2859254572318098e-2),([2,2,1],2.122101503707541e-2),([2,1,0],2.081806707448021e-2),([1,0,0],1.7668886015289015e-2),([2,2,2],1.5915761277806563e-2),([0,2,1],1.5534474079012931e-2),([0,1,0],1.5239503048212075e-2),([0,2,2],1.1650855559259689e-2),([0,0,1],1.0356316052675282e-2),([0,0,2],7.767237039506465e-3),([0,0,0],5.178158026337637e-3)])

Look closely at the 6th and 7th elements.

([2,2,0],5.305253759268838e-2),([2,0,2],5.305253759268837e-2)

vs.

([2,0,2],5.3052537592688506e-2),([2,2,0],5.3052537592688506e-2)

The floating point numbers are so close that they are ordered in two different ways. The equality check at the end is fine with small differences because it uses ~==, but the sorting algorithm distinguishes them because it uses hard comparison. A simple solution is to sort again before testing. PR upcoming.

reubenharry commented 1 year ago

The stuff with pipes was added by me, partly to make MCMC smoother, partly to explore the stuff we're now exploring with dunai/Rhine. It's really not important, and if it's a pain point, can happily be removed. Sorry for the hassle there.

turion commented 1 year ago

No worries. It's actually helpful to have two different implementations of something for comparison. So I think there is value in that test. But it would be great if we can tackle #260.