jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 590 forks source link

Poor migration guide from pre 1.0.0 release #1031

Open DrOncogene opened 1 year ago

DrOncogene commented 1 year ago

I was attending CS50AI course and the chapter on Bayesian networks uses this library. However the code was way back from around v0.8.1. Classes like Node were still in use. I suffered trying to look for a guide or at least the docs for v0.14.8 or v0.8.1.

No such documentation exists not even on the official docs website. Kindly point me to any if it exist or help me convert the below code to the latest guide. I should be able to pick it up from there.

from pomegranate import *

# Rain node has no parents
rain = Node(DiscreteDistribution({
    "none": 0.7,
    "light": 0.2,
    "heavy": 0.1
}), name="rain")

# Track maintenance node is conditional on rain
maintenance = Node(ConditionalProbabilityTable([
    ["none", "yes", 0.4],
    ["none", "no", 0.6],
    ["light", "yes", 0.2],
    ["light", "no", 0.8],
    ["heavy", "yes", 0.1],
    ["heavy", "no", 0.9]
], [rain.distribution]), name="maintenance")

# Train node is conditional on rain and maintenance
train = Node(ConditionalProbabilityTable([
    ["none", "yes", "on time", 0.8],
    ["none", "yes", "delayed", 0.2],
    ["none", "no", "on time", 0.9],
    ["none", "no", "delayed", 0.1],
    ["light", "yes", "on time", 0.6],
    ["light", "yes", "delayed", 0.4],
    ["light", "no", "on time", 0.7],
    ["light", "no", "delayed", 0.3],
    ["heavy", "yes", "on time", 0.4],
    ["heavy", "yes", "delayed", 0.6],
    ["heavy", "no", "on time", 0.5],
    ["heavy", "no", "delayed", 0.5],
], [rain.distribution, maintenance.distribution]), name="train")

# Appointment node is conditional on train
appointment = Node(ConditionalProbabilityTable([
    ["on time", "attend", 0.9],
    ["on time", "miss", 0.1],
    ["delayed", "attend", 0.6],
    ["delayed", "miss", 0.4]
], [train.distribution]), name="appointment")

# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_states(rain, maintenance, train, appointment)

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

# Finalize model
model.bake()
jmschrei commented 1 year ago

Hi. Sorry that you're having trouble. I've included tutorials on how to write Bayesian networks here: https://pomegranate.readthedocs.io/en/latest/tutorials/B_Model_Tutorial_6_Bayesian_Networks.html

This documentation is linked in two formats -- as a page on the documentation, and a link to the tutorials folder -- as the first line in the README after a note describing that there are differences. Additionally, in the examples folder there is another example of a Bayesian network that might be helpful.

I have not written an explicit guide for how to rewrite models but it should be fairly straightforward to convert that to the new format. The biggest differences are simply not needing to use Node or State objects and having the probability distributions being d dimensional tensors instead of a list of lists and not needing to bake in the end. These changes are written out in a section of the README: https://github.com/jmschrei/pomegranate#high-level-changes

DrOncogene commented 1 year ago

Thanks for the prompt response. Please how can I add the labels to Categorical distribution for example?

jmschrei commented 1 year ago

A design choice that I made with the latest version is to only accept integer labels in the modeling step, similar to scikit-learn and other ML repositories. You can keep lists of labels on your end and index into them at the end.

kronenpj commented 10 months ago

Unfortunately this isn't enough guidance for me to port the class example to pomegranate-1.0+. My lack of familiarity with the topic makes the tutorials and examples useless for converting to the new representation. I've gotten this far:

from pomegranate.bayesian_network import BayesianNetwork
from pomegranate.distributions import Categorical, JointCategorical

# Rain node has no parents
orig_rain_probs = {"none": 0.7, "light": 0.2, "heavy": 0.1}
rain_probs = [[0.7, 0.2, 0.1]]
rain = Categorical(rain_probs)

# Track maintenance node is conditional on rain
orig_maint_probs = probs = [
    ["none", "yes", 0.4],
    ["none", "no", 0.6],
    ["light", "yes", 0.2],
    ["light", "no", 0.8],
    ["heavy", "yes", 0.1],
    ["heavy", "no", 0.9],
]
maint_probs = {0: 0.4, 1: 0.6, 2: 0.2, 3: 0.8, 4: 0.1, 5: 0.9}
maintenance = JointCategorical(maint_probs, [rain.distribution])

# Train node is conditional on rain and maintenance
orig_train_probs = [
    ["none", "yes", "on time", 0.8],
    ["none", "yes", "delayed", 0.2],
    ["none", "no", "on time", 0.9],
    ["none", "no", "delayed", 0.1],
    ["light", "yes", "on time", 0.6],
    ["light", "yes", "delayed", 0.4],
    ["light", "no", "on time", 0.7],
    ["light", "no", "delayed", 0.3],
    ["heavy", "yes", "on time", 0.4],
    ["heavy", "yes", "delayed", 0.6],
    ["heavy", "no", "on time", 0.5],
    ["heavy", "no", "delayed", 0.5],
]
train_probs = {
    0: 0.8,
    1: 0.2,
    2: 0.9,
    3: 0.1,
    4: 0.6,
    5: 0.4,
    6: 0.7,
    7: 0.3,
    8: 0.4,
    9: 0.6,
    10: 0.5,
    11: 0.5,
}
train = JointCategorical(
    train_probs, [rain.distribution, maintenance.distribution]
)

# Appointment node is conditional on train
original_appointment_probs = [
    ["on time", "attend", 0.9],
    ["on time", "miss", 0.1],
    ["delayed", "attend", 0.6],
    ["delayed", "miss", 0.4],
]
appointment_probs = {0: 0.9, 1: 0.1, 2: 0.6, 3: 0.4}
appointment = JointCategorical(appointment_probs, [train.distribution])

# Create a Bayesian Network and add states
model = BayesianNetwork()
# model.add_states(rain, maintenance, train, appointment)
model.add_distributions([rain, maintenance, train, appointment])

# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)

But I receive this error:

Traceback (most recent call last):
  File "/home/cs50ai/src/harvard_cs50_ai/week2/bayesnet/likelihood.py", line 1, in <module>
    from model import model
  File "/home/cs50ai/src/harvard_cs50_ai/week2/bayesnet/model.py", line 19, in <module>
    maintenance = JointCategorical(maint_probs, [rain.distribution])
  File "/home/cs50ai/src/harvard_cs50_ai/week2/.venv/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Categorical' object has no attribute 'distribution'

I don't see an applicable distribution type for <0.7, 0.2, 0.1> because that is the distribution for that variable. I also don't see how to set it within Categorical.

jmschrei commented 10 months ago

Hi @kronenpj

Sorry you're encountering issues. I think there are a few issues with your code.

(1) JointCategorical doesn't accept dictionaries, it accepts a tensor with d dimensions. If you five examples with four features each, you'd normally store that in a matrix with shape=(5,4). Here, you'd represent that data in a tensor of shape=(2,2,2,2) -- 2 for the number of possibilities in each feature, 4 dimensions for the 4 features. See the documentation: https://github.com/jmschrei/pomegranate/blob/master/pomegranate/distributions/joint_categorical.py#L34

(2) BayesianNetwork doesn't accept JointCategorical, only Categorical and ConditionalCategorical. Remember that when you're defining the network you're defining the source nodes, Categorical distributions, and the internal nodes, ConditionalCategorical ones. See https://github.com/jmschrei/pomegranate/blob/master/examples/Bayesian_Network_Monty_Hall.ipynb for how to format your Categorical and ConditionalCategorical distributions.

(3) When making these distributions you no longer need to pass in the parent distributions into the child distributions directly. This is handled by the BayesianNetwork object. You just need to pass in the probabilities.

Let me know if you have any other questions.

mksit commented 10 months ago

Hi @jmschrei

ConditionalCategorical is poorly documented. Can you please explain more about how to use it in Bayesian network? How does each entry in the input distribution of ConditionalCategorical correspond to the other nodes?