Open Jamesfox1 opened 3 years ago
@Jamesfox1 The problem, in this case, is that the state_names
for cpd_j
isn't specified. Currently, the state names between the CPDs aren't shared, so cpd_a
uses the state names [a, b]
for A
but since it's not specified in cpd_j
it automatically assigns the state names for A
to be [0, 1]
. And hence when we try to reduce cpd_j
on A: a
, it can't identify the state name. A simple fix for now would be to also specify the state names for cpd_j
. Something like:
cpd_j = TabularCPD('B', 2, [[1, 0], [0, 1]], ['A'], [2], state_names={'A': ['a', 'b'], 'B': ['c', 'd']})
Would it make more sense for:
(We're following this approach in pycid, the influence diagram library, built on pgmpy.)
I am a bit divided on this issue. Currently, the state_names
require the dictionary for state names of both the variable and evidence variables. The problem is that if we relax this and only ask for variable's state names, it makes it dependent on other CPDs (as it will require parent CPDs to be defined as well), which results in a loss of modularity. But at the same time, it will make it easier for the users requiring less input from them.
My idea behind keeping things modular was to have separate objects for model structure and different parameterizations so that users can combine these in any way they want. So, it allows users to easily add/remove/replace CPDs, or modify network structure as they wish. Or if someone wants to add a new type of parameterization, it would be straightforward. It also helps users if they just want to do operations on a single / a bunch of CPDs without considering the model altogether.
Subject of the issue
We can't use Belief Propagation's query method with different state names (ie strings or numbers that aren't the state numbers)
Your environment
Steps to reproduce
This example creates the error because the state_names don't correspond to the index of the variable's domain (which you refer to as its "state number").
Therefore, belief_propagation.query(variables=['J'], evidence={'A': 'a'}) raises an error. (it also produces an error if we gave TabularCPD final argument state_names={'A': [2, 1]} and asked a query of the form: belief_propagation.query(variables=['J'], evidence={'A': 2}) because again the state name ("2") don't correspond to the state number "0").
Expected behaviour
The version below works as expected because the state names [0,1] correspond to the indexes of variable A's domain:
Actual behaviour
The issue is arising from the reduce method of the DiscreteFactor class within DiscreteFactor.py:
In lines 510-515 it says that this is where you're converting the "state names to state number". However, by simply printing out the state_names just before line 510, you find that it's already converted the state_names to the state_numbers - therefore, this conversion doesn't do anything.