jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.35k stars 589 forks source link

BayesianNetwork fails to predict (Wikipedia example) #438

Closed 1pakch closed 6 years ago

1pakch commented 6 years ago

I implemented the Bayesian network example presented on Wikipedia for Bayesian networks witn pomegranate 0.8.1. Inference using predict_proba fails to produce the right answer. The state probabilities seem to be correct. Am I missing something?

# Bayesian network for wikipedia example at
# https://en.wikipedia.org/wiki/Bayesian_network

from pomegranate import *

from pomegranate import ConditionalProbabilityTable as CPT
from numpy import allclose
from itertools import product

sprinkler_table = [
    [0, 0, 0.6 ],
    [0, 1, 0.4 ],
    [1, 0, 0.99],
    [1, 1, 0.01],
]

grass_wet_table = [
    [0, 0, 0, 1   ],
    [0, 0, 1, 0   ],
    [0, 1, 0, 0.2 ],
    [0, 1, 1, 0.8 ],
    [1, 0, 0, 0.1 ],
    [1, 0, 1, 0.9 ],
    [1, 1, 0, 0.01],
    [1, 1, 1, 0.99],
]

rain_dist = DiscreteDistribution({0: 0.8, 1: 0.2})
sprinkler_dist = CPT(sprinkler_table, [rain_dist])
grass_wet_dist = CPT(grass_wet_table, [sprinkler_dist, rain_dist])

rain = Node(rain_dist, 'rain')
sprinkler = Node(sprinkler_dist, 'sprinkler')
grass_wet = Node(grass_wet_dist, 'grass_wet')

m = BayesianNetwork('test')
m.add_states(rain, sprinkler, grass_wet)
m.add_edge(rain, sprinkler)
m.add_edge(sprinkler, grass_wet)
m.add_edge(rain, grass_wet)
m.bake()

def assert_p_rains_if_wet(p, label):
    expected = 0.36
    try:
        assert allclose(p, expected, atol=0.01)
    except AssertionError as e:
        print('Failure in assert_p_rains_if_wet(label=%s)' % label)
        print('expected =', expected)
        print('observed =', p)
        raise e

assert [s.name for s in m.states] == ['rain', 'sprinkler', 'grass_wet']

# calculate p_rains_if_wet from state probabilities
p = dict()
for r, s, g in product([0, 1], [0, 1], [0, 1]):
    state = (r, s, g)
    p[state] = m.probability(state)

p_rains_if_wet = (
        (p[1, 0, 1] + p[1, 1, 1])
        / (p[0, 0, 1] + p[0, 1, 1] + p[1, 0, 1] + p[1, 1, 1]))
assert_p_rains_if_wet(p_rains_if_wet, 'from_states') # OK

# calculate p_rains_if_wet using predict_proba
dists = m.predict_proba({'grass_wet': 1})

p_grass_wet = dists[2].parameters[0][1]
assert allclose(p_grass_wet, 1) # OK

p_rains_if_wet = dists[0].parameters[0][1] # yields ~0.06
assert_p_rains_if_wet(p_rains_if_wet, 'predict_proba') # FAILS
jmschrei commented 6 years ago

Howdy

Thanks for pointing out the issue. After some initial digging, it looks like the issue is because I use loopy belief propogation on a factor graph, which is a fast and sometimes-exact approximate algorithm. If you have a tree-like network at the unobserved variables, or you're only doing a forward pass of inference, this algorithm will be exact. Otherwise, it is not guaranteed to be. It looks like in this particular case it converges to being more confident that the sprinkler is the culprit.

If you're looking for results more closely matching the exact algorithm on simple Bayesian networks, you can set max_iterations=1 in the predict_proba method and the result you get is a 33.2% chance that it rained.

The reason that I use the approximate algorithm versus the exact algorithm is two fold. The first is that it's much easier to implement the approximate algorithm. The second is that it scales well on larger Bayesian networks, and is even typically much faster on smaller ones, while still preserving the argmax over the keys.

Let me know if you have any other questions!

1pakch commented 6 years ago

Thanks a lot for the answer! I might create a PR with updated docs in the unlikely case I will find a bit of free time:)

There is another problem which occurs when I try to use pomegranate for simple genetics. I submitted it as a separate issue #446.