Open Fortune-codebox opened 9 months ago
Have you read the tutorial on Bayesian networks in pomegranate >= 1.0.0? https://github.com/jmschrei/pomegranate/blob/master/docs/tutorials/B_Model_Tutorial_6_Bayesian_Networks.ipynb
Let me know if that's still not helpful
Yes i was able to come up with a solution for the model using the link you shared but am unable to fit the data successfully, Please can you help me figuring out the right X data to fit this model using random.randint
import numpy as np
from pomegranate.distributions import *
from pomegranate.bayesian_network import BayesianNetwork
rain = Categorical([[0.7, 0.2, 0.1]])
maintenance = ConditionalCategorical([[[0.4, 0.6], [0.2, 0.8], [0.1, 0.9]]])
train = ConditionalCategorical([[
[0.8, 0.2],
[0.9, 0.1],
[0.6, 0.4],
[0.7, 0.3],
[0.4, 0.6],
[0.5, 0.5]]])
# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_distributions([rain, maintenance, train, appointment])
# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)
Also i will like to know if there is anything wrong this solution, Thanks
I've successfully managed to run the code:
from pomegranate import *
import numpy as np
from pomegranate.distributions import *
from pomegranate.bayesian_network import BayesianNetwork
rain = Categorical(
[
[0.7, 0.2, 0.1],
]
)
maintenance = ConditionalCategorical(
[
[
[0.4, 0.6],
[0.2, 0.8],
[0.1, 0.9],
],
]
)
train = ConditionalCategorical(
[
[
[
[0.8, 0.2],
[0.9, 0.1],
],
[
[0.6, 0.4],
[0.7, 0.3],
],
[
[0.4, 0.6],
[0.5, 0.5],
],
]
]
)
appointment = ConditionalCategorical(
[
[
[0.9, 0.1],
[0.6, 0.4],
],
]
)
# Create a Bayesian Network and add states
model = BayesianNetwork()
model.add_distributions([rain, maintenance, train, appointment])
# Add edges connecting nodes
model.add_edge(rain, maintenance)
model.add_edge(rain, train)
model.add_edge(maintenance, train)
model.add_edge(train, appointment)
for likelihood.py
use this:
import numpy
import torch
from model import model
rain_values = ["none", "light", "heavy"]
maintenance_values = ["yes", "no"]
train_values = ["on time", "delayed"]
appoinment_values = ["attend", "miss"]
probability = model.probability(
torch.as_tensor(
[
[
rain_values.index("none"),
maintenance_values.index("no"),
train_values.index("on time"),
appoinment_values.index("attend"),
]
]
)
)
print(probability)
This code is from cs50ai.- I'm currently taking the course :)
sample.py
:
from pomegranate.distributions import ConditionalCategorical
from collections import Counter
from model import model
# Rejection sampling
# Compute distribution of Appointment given that train is delayed
N = 10000
data = []
for i in range(N):
sample = model.sample(1)[0]
# sample == "delayed"
if sample[2] == 1.0:
data.append("attend" if sample[3] == 0 else "miss")
print(Counter(data))
inference.py:
import torch
from model import model
X = torch.tensor(
[
[
-1,
-1,
1, # delayed
-1,
]
]
)
X_masked = torch.masked.MaskedTensor(X, mask=(X != -1))
states = (
("rain", ["none", "light", "heavy"]),
("maintenance", ["yes", "no"]),
("train", ["on time", "delayed"]),
("appointment", ["attend", "miss"]),
)
# Calculate predictions
predictions = model.predict_proba(X_masked)
# Print predictions for each node
for (node_name, values), prediction in zip(states, predictions):
if isinstance(prediction, str):
print(f"{node_name}: {prediction}")
else:
print(f"{node_name}")
for value, probability in zip(values, prediction[0]):
print(f" {value}: {probability:.4f}")
@jmschrei is there any way to get the joint probability of a bayesian network using model.probability(X)
where X
has some missing facts? (like setting -1
to some data)
I know I can do this by marginalization, but it would be less expensive to just calculate the product of the probabilities up to the current node.
Example:
If my model has A,B,C,D
nodes
and I want to compute P(A,B,C)
, I could do: P(A,B,C) = P(A|B,C)P(B|C)P(C)
and ignore D
PS: I'm currently learning this, I could be completely wrong on what I'm doing
Thanks @itolosa for your help! Where is cs50ai being taught?
Yes, you should be able to use torch.masked.MaskedTensor
to indicate missingness. Let me know if you run into any issues.
@jmschrei CS50AI Harvard, but I'm taking the online version through edx: link
I've tried using a masked tensor but it fails:
# assume the same model as the previous examples
X = torch.as_tensor(
[
[
rain_values.index("none"),
maintenance_values.index("no"),
train_values.index("on time"),
-1,
]
]
)
X_masked = torch.masked.MaskedTensor(X, mask=(X != -1))
probability = model.probability(X_masked) # <--- throws an error
Error:
~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/masked/maskedtensor/core.py:156: UserWarning: The PyTorch API of MaskedTensors is in prototype stage and will change in the near future. Please open a Github issue for features requests and see our documentation on the torch.masked module for further information about the project.
warnings.warn(("The PyTorch API of MaskedTensors is in prototype stage "
~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/masked/maskedtensor/core.py:299: UserWarning: unbind is not implemented in __torch_dispatch__ for MaskedTensor.
If you would like this operator to be supported, please file an issue for a feature request at https://github.com/pytorch/maskedtensor/issues with a minimal reproducible code snippet.
In the case that the semantics for the operator are not trivial, it would be appreciated to also include a proposal for the semantics.
warnings.warn(msg)
Traceback (most recent call last):
File "likelihood.py", line 25, in <module>
probability = model.probability(X_masked)
File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/pomegranate/distributions/_distribution.py", line 61, in probability
return torch.exp(self.log_probability(X))
File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/pomegranate/bayesian_network.py", line 352, in log_probability
logps += distribution.log_probability(X_)
File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/pomegranate/distributions/conditional_categorical.py", line 134, in log_probability
logps[i] += self._log_probs[j][tuple(X[i, :, j])]
File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/_tensor.py", line 940, in __iter__
return iter(self.unbind(0))
File "~/.pyenv/versions/cs50-ai/lib/python3.8/site-packages/torch/masked/maskedtensor/core.py", line 274, in __torch_function__
ret = func(*args, **kwargs)
TypeError: no implementation found for 'torch._ops.aten.unbind.int' on types that implement __torch_dispatch__: [<class 'torch.masked.maskedtensor.core.MaskedTensor'>]
I guess pomegranate/distributions/conditional_categorical.py", line 134
is failing because it performs an operation that requires unbind
but it's not implemented for masked tensors.
In fact, I've isolated the error:
tuple(X_masked)
it throws the same exception as before.
Thanks a lot @itolosa and @jmschrei, you guys are the best.
I think a challenge with model.probability
is that there are two ways that one could interpret that given incomplete data. The first is that one should marginalize out the unseen variables. The second is that one should infer the missing variables and then calculate the probabilities given the complete, but partially inferred, example. The second can be done by first doing predict
and then passing in the completed example.
Although I understand the procedure of the second option, I can't imagine the consequences in terms of the probability -- I'm not an expert on this, so I don't know if they're equivalent or not with the first option.
In any case, in terms of consistency, as a developer I would expect that the method performs the same kind (semantically) of calculations for any given input, and for any other special not so equivalent procedure to use some other method.
Older versions of pomegranate were able to receive an incomplete example and return the probability, so that was my initial expectation when I tried to use model.probability
.
I've finally decided to create my own version of a bayesian network, just as a learning exercise, so this issue is no longer a concern for me.
In any case if you still want to implement this, and need help to upgrade some code, I'd be glad to be part of that. @jmschrei
I agree with you that having the model not accept masked tensors is a problem that I need to fix.
I would love to see your solution.
As you've probably inferred, I'm super time-constrained right now. I'm going on the faculty job market and it's taking more time than I was hoping for. I should have more time starting next year and begin to work through the backlog.
I completely understand.
My solution is not efficient in terms of time complexity nor uses tensors, so I could open a PR to create a new method in the model, not documented for now, just as a proposal to implement the probability with missing facts using tensors (I hope). I can't promise when but I hope soon.
Thank you for taking the time to give us a response. 🤝
Of course -- thanks for engaging with the package and raising issues/working to find solutions!
If you have time to write a draft solution to the issue, even if it's not the most efficient, that'd be hugely helpful as I can then build off it.
I have trouble with Discrete Distribution too.
for this code, no matter what I try to do I get some kind of errors
` from pomegranate.distributions import *
metal = DiscreteDistribution({'T': 0.2, 'F': 0.8}) `
Error:
`NameError Traceback (most recent call last) Cell In[5], line 7 5 from pomegranate.bayesian_network import BayesianNetwork 6 # Unconditional distribution for the metal node ----> 7 metal = DiscreteDistribution({'T': 0.2, 'F': 0.8})
NameError: name 'DiscreteDistribution' is not defined`
It's hard for me to provide feedback from only that tiny snippet, but it's worth noting that DiscreteDistribution
is no longer in pomegranate
as of v1.0.0. None of the distribution objects have the word Distribution
in them anymore.
Am new with pomegranate in general and i came across the snippet below but i can't run the script using the new pomegranate>=1.0.0 because obviously some of the variables, classes don't exist anymore. They include
I need help upgrading the code to work with pomegranate>=1.0.0, Thanks.