Weird sampling outputs on a toy example

rihardsk commented 2 years ago

To try out the library, i implemented the Monty Hall problem. My intention is to infer the possible states of the unobserved rooms, given some observations, e.g., given that the guest chooses room A and Monty chooses room B, what are the probabilities for the prize location. Using monad-bayes (maybe naively) for this somehow gives wrong results.

Here's the code:

module Monty where

import Control.Monad.Bayes.Class (MonadInfer, uniformD, condition)
import Control.Monad (forM_)
import Control.Monad.Bayes.Sampler (sampleIOfixed, sampleIO)
import Control.Monad.Bayes.Weighted (prior)
import Control.Monad.Bayes.Traced (mh)

data Room = A | B | C
  deriving (Eq, Show)
data Observation = Guest Room | Prize Room | Monty Room

condMonty :: MonadInfer m => Room -> Room -> m Room
condMonty guest prize = let options = [A, B, C]
                            validOptions = filter (\x -> x /= guest && x /= prize) options
                        in uniformD validOptions

data GameState = GameState
  { s_guest :: Room
  , s_prize :: Room
  , s_monty :: Room
  }
stateToTuple :: GameState -> (Room, Room, Room)
stateToTuple s = (s_guest s, s_prize s, s_monty s)

montyModel :: MonadInfer m => [Observation] -> m GameState
montyModel observations = do
  guest <- uniformD [A, B, C]
  prize <- uniformD [A, B, C]
  monty <- condMonty guest prize

  forM_ observations (scoreObs guest prize monty)
  return $ GameState guest prize monty
  where
    scoreObs guest _ _ (Guest room) = condition $ guest == room
    scoreObs _ prize _ (Prize room) = condition $ prize == room
    scoreObs _ _ monty (Monty room) = condition $ monty == room

data RoomFreq = RoomFreq
  { freqA :: Double
  , freqB :: Double
  , freqC :: Double
  }
  deriving Show

getRates :: [Room] -> RoomFreq
getRates rs =
  let
    (a, b, c) = foldl addOne (0, 0, 0) rs
    len = fromIntegral $ length rs
  in RoomFreq (a / len) (b / len) (c / len)
  where
    addOne (a, b, c) A = (a + 1, b, c)
    addOne (a, b, c) B = (a, b + 1, c)
    addOne (a, b, c) C = (a, b, c + 1)

runInference :: IO ()
runInference = do
  let nsamples = 5000
      burnin = 500

  -- Given the guest's choice and Monty's choice, i want to infer where the prize is located
  s <- take nsamples <$> (sampleIO $ prior $ mh (nsamples + burnin) $ montyModel [Guest A, Monty B])
  putStrLn "Last 20 samples:"
  print $ map stateToTuple $ take 20 s
  putStrLn ""

  let guestRates = getRates $ s_guest <$> s
      prizeRates = getRates $ s_prize <$> s
      montyRates = getRates $ s_monty <$> s
  putStrLn "Guest room choice frequency:" >> putStr "  " >> print guestRates
  putStrLn "Prize location frequency:" >> putStr "  " >> print prizeRates
  putStrLn "Monty room choice frequency:" >> putStr "  " >> print montyRates

And here are some typical outputs from running it three times: Run 1

Last 20 samples:
[(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B),(A,C,B)]

Guest room choice frequency:
  RoomFreq {freqA = 1.0, freqB = 0.0, freqC = 0.0}
Prize location frequency:
  RoomFreq {freqA = 0.0, freqB = 0.0, freqC = 1.0}
Monty room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}

Run 2

Last 20 samples:
[(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C),(B,A,C)]

Guest room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}
Prize location frequency:
  RoomFreq {freqA = 1.0, freqB = 0.0, freqC = 0.0}
Monty room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 0.0, freqC = 1.0}

Run 3

Last 20 samples:
[(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C),(B,B,C)]

Guest room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}
Prize location frequency:
  RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}
Monty room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 0.0, freqC = 1.0}

All of these are wrong and 2 and 3 should have been impossible, given the model specification, i think.

To Reproduce Run the code above and observe the outputs.

Expected behavior Given the inputs that are hardcoded in the code – guest chooses A and Monty chooses B –, i'd expect the following output:

Guest room choice frequency:
  RoomFreq {freqA = 1.0, freqB = 0.0, freqC = 0.0}
Prize location frequency:
  RoomFreq {freqA = 0.333, freqB = 0.0, freqC = 0.666}
Monty room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}

Environment

OS name + version: Arch Linux + umm... 5.15.11 kernel
Version of the code: 3f49a8b79348320caa7a3f3403723a9e2eebfe84

Additional context I don't know whether this is a valid approach to implementing the problem. Originally i wanted to see how easy it would be to convert this Bayesian network example to monad-bayes. My guess is that the MCMC sampler might be diverging or that condMonty shouldn't be used like that to define the conditional probability distribution, or both, but i've no idea how to debug this.

rihardsk commented 2 years ago

Bump. Any ideas on what might be going wrong here would be greatly appreciated.

idontgetoutmuch commented 2 years ago

@rihardsk I won't be able to look at this until later this week. Apologies but I am not clear what you are trying to do. My recollection of Monty Hall would be something like guest chooses A, Host chooses B and then the prior distribution of 1/3, 1/3, 1/3 should become 1/3, 0, 2/3. Are you trying to do this via simulation? In which case, I don't know what Metropolis-Hastings (mh) is doing here.

idontgetoutmuch commented 2 years ago

Perhaps you are trying to do this?

iMonty :: MonadInfer m => m Bool
iMonty = do
  -- The guest chooses at random from 3 doors
  g <- uniformD [A, B, C]
  -- The prize is put at random behind one of 3 doors
  p <- uniformD [A, B, C]
  -- The host opens a door neither chosen by the guest nor behind
  -- which the prize stands.
  let r = (Set.fromList [A, B, C]) \\ (Set.fromList [g, p])
  h <- uniformD (Set.toList r)
  -- The guest now changes their choice to be neither their original
  -- choice nor the host's choice.
  g' <- uniformD $ Set.toList ((Set.fromList [A, B, C]) \\ (Set.fromList [g, h]))
  return (g' == p)

rihardsk commented 2 years ago

@idontgetoutmuch thank you for the example. I think your approach corresponds to a simulation of the game, but i wanted it to be more general. I want to model the joint posterior distribution of game state given some arbitrary observations – P(GuestInitialDoor, PrizeDoor, HostDoor | observations). The observations could be "Guest initially chooses A; Host chooses B" and I want to arrive to a joint posterior probability distribution, which, when marginalized, gives

P(GuestInitialDoor = A | obs) = 1 and 0 in other cases;
P(HostDoor = B | obs) = 1 and 0 in other cases;
P(PrizeDoor = A | obs) with probability 0.333, P(PrizeDoor = B | obs) with probability 0, P(PrizeDoor = C | obs) with probability 0.666.

So i think MH makes sense here because i want to draw samples from the joint posterior distribution corresponding to the above probabilities.

The reason for modeling the problem in such a non-straightforward way is that i wanted to test how easy it would be to model a Bayesian network with monad-bayes. I drew inspiration from this python example for a dedicated Bayesian network library:

from pomegranate import *

guest = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
prize = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
monty = ConditionalProbabilityTable(
        [['A', 'A', 'A', 0.0],
         ['A', 'A', 'B', 0.5],
         ['A', 'A', 'C', 0.5],
         ['A', 'B', 'A', 0.0],
         ['A', 'B', 'B', 0.0],
         ['A', 'B', 'C', 1.0],
         ['A', 'C', 'A', 0.0],
         ['A', 'C', 'B', 1.0],
         ['A', 'C', 'C', 0.0],
         ['B', 'A', 'A', 0.0],
         ['B', 'A', 'B', 0.0],
         ['B', 'A', 'C', 1.0],
         ['B', 'B', 'A', 0.5],
         ['B', 'B', 'B', 0.0],
         ['B', 'B', 'C', 0.5],
         ['B', 'C', 'A', 1.0],
         ['B', 'C', 'B', 0.0],
         ['B', 'C', 'C', 0.0],
         ['C', 'A', 'A', 0.0],
         ['C', 'A', 'B', 1.0],
         ['C', 'A', 'C', 0.0],
         ['C', 'B', 'A', 1.0],
         ['C', 'B', 'B', 0.0],
         ['C', 'B', 'C', 0.0],
         ['C', 'C', 'A', 0.5],
         ['C', 'C', 'B', 0.5],
         ['C', 'C', 'C', 0.0]], [guest, prize])

s1 = Node(guest, name="guest")
s2 = Node(prize, name="prize")
s3 = Node(monty, name="monty")

model = BayesianNetwork("Monty Hall Problem")
model.add_states(s1, s2, s3)
model.add_edge(s1, s3)
model.add_edge(s2, s3)
model.bake()

# >>> print(model.probability([['A', 'A', 'A'],
#                              ['A', 'A', 'B'],
#                              ['C', 'C', 'B']]))
# [ 0.          0.05555556  0.05555556]
# 
# >>> print(model.predict([['A', 'B', None],
#                  ['A', None, 'C'],
#                  [None, 'B', 'A']]))
# [['A' 'B' 'C']
#  ['A' 'B' 'C']
#  ['C' 'B' 'A']]

Here also the model specifies the joint probability distribution of P(guest, prize, host), but, unlike in my example, the distribution isn't conditioned on some observations (maybe that's the issue with my example – i shouldn't use observations but infer the same thing from the resulting joint distribution samples?)

I'm new to probabilistic programming and somewhat new to Bayesian inference so i might have messed up the terminology somewhere.

reubenharry commented 2 years ago

I'll try to look at this more closely this week, but my guess is that the model has a bug, not the MH. My general advice for debugging probabilistic programs is to write a small enough version that you can use enumerate and convince yourself that gives the correct answer. Re Bayes nets, I think they should be straightforward to implement, since probabilistic programs are really just an extension of Bayes nets in the first place.

reubenharry commented 2 years ago

OK, so if you run with enumerate, as in:

runInference = enumerate $ fmap stateToTuple $ montyModel [Guest A, Monty B]

you get:

[((A,C,B),0.6666666666666666),((A,A,B),0.3333333333333334),((A,A,C),0.0),((A,B,C),0.0),((B,A,C),0.0),((B,B,A),0.0),((B,B,C),0.0),((B,C,A),0.0),((C,A,B),0.0),((C,B,A),0.0),((C,C,A),0.0),((C,C,B),0.0)]

This suggests that the model is fine. So I think the issue is with the MH inference. In the cases where you get results which are wrong, what is happening is that the chain initiates with a 0 probability guess and then rejects every subsequent proposal. We should probably add code to handle this case with an exception rather than silent failure.

The more interesting question is why the proposal never successfully transitions to (A,C,B) or (A,A,B).

The reason is this: the default MH proposal is single-site. It tries modifying one of the entries in (x,y,z) at a time. But transitioning from (A,C,B) to (A,A,B) requires two jumps, so it never gets there. In other words, ergodicity is violated.

I'm planning to extending monad-bayes' MCMC capabilities later this year to make it more practical. But in the meantime, to confirm this hypothesis, I changed the MH code to the following:

-- | A single Metropolis-corrected transition of single-site Trace MCMC.
mhTrans :: MonadSample m => Weighted (FreeSampler m) a -> Trace a -> m (Trace a)
mhTrans m t@Trace {variables = us, density = p} = do
  let n = length us
  us' <- mapM (const random) us
  ((b, q), vs) <- runWriterT $ runWeighted $ Weighted.hoist (WriterT . withPartialRandomness us') m
  let ratio = (exp . ln) $ min 1 (q * fromIntegral n / (p * fromIntegral (length vs)))
  accept <- bernoulli ratio
  return $ trace (show (q, accept)) $ if accept then Trace vs b q else t

This gave:

Guest room choice frequency:
  RoomFreq {freqA = 1.0, freqB = 0.0, freqC = 0.0}
Prize location frequency:
  RoomFreq {freqA = 0.4856, freqB = 0.0, freqC = 0.5144}
Monty room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}

Thanks for posting this, it's a great example of what needs to improve in the MCMC code! (I'll leave the issue open to remind myself that this needs fixing)

reubenharry commented 2 years ago

Re Bayes nets, they are definitely straightforward to express in monad-bayes, but the question is what inference method you would want. My understanding is that the nice thing about graphical models is that you can have inference methods which take advantage of the conditional independency structure given by the graph, and monad-bayes doesn't yet have capabilities to do this.

rihardsk commented 2 years ago

Thank you for looking into this!

This gave:
Guest room choice frequency:
  RoomFreq {freqA = 1.0, freqB = 0.0, freqC = 0.0}
Prize location frequency:
  RoomFreq {freqA = 0.4856, freqB = 0.0, freqC = 0.5144}
Monty room choice frequency:
  RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}
I expected a different output for "Prize location frequency":
Guest room choice frequency:
RoomFreq {freqA = 1.0, freqB = 0.0, freqC = 0.0}
Prize location frequency:
RoomFreq {freqA = 0.333, freqB = 0.0, freqC = 0.666}
Monty room choice frequency:
RoomFreq {freqA = 0.0, freqB = 1.0, freqC = 0.0}
which would correspond to what you got with enumerate. Or is your implementation of mhTrans meant to do something different?

the nice thing about graphical models is that you can have inference methods which take advantage of the conditional independency structure given by the graph

Is this something that's on the roadmap for monad-bayes? I've seen that the Pyro framework in Python uses "plate" definitions for conditional independence and i was wondering whether something similar would be possible in monad-bayes.

reubenharry commented 2 years ago

Yeah, this is because the proposal I used just resamples the entire trace randomly at each step in the chain. This makes the MCMC totally inefficient, and 5000 steps isn't enough to converge. The broader issue is that you want to be able to specify a sensible proposal - something like: with some probability mutate each of the three variables to the others. If you look at a PPL like Gen, that's straightforward, so getting that capability here would be good and is high on my todo list.

And yeah, I'm not super familiar with Pyro, but something along those lines would be nice.

reubenharry commented 2 years ago

Closing this. Gen style customizability is available in the code fragment accompanying https://dl.acm.org/doi/10.1145/3371087. Building it out to be practical is a large scale effort that I at least am not going to be doing.

tweag / monad-bayes

Weird sampling outputs on a toy example #121