Closed SrGrace closed 4 years ago
@ankurankan, Could you plz help?
its happening even while predicting on the same training corpus if I save the model and then load it using BIFReader, BIFWriter
@SrGrace You can pass an additional state_names
argument to the fit
method which specifies all the possible states for the variables. This will automatically create states which don't exist in the data. Have a look at the documentation here: https://github.com/pgmpy/pgmpy/blob/dev/pgmpy/models/BayesianModel.py#L489.
About the BIFReader and BIFWriter, are your state names in data int
values? Because when reading back from a file BIFReader has no way to distinguish between whether the state name was int
or str
, so by default, it always assumes the state names to be str
. So, if you are trying to predict using int
state names after reading, it will throw an error. If you are getting the error for some other reason, could you share your code so that I can reproduce it?
@ankurankan thanks for replying!
Okay, now I'm typecasting the test df
to str
and then its working fine when using BIFReader and BIFWriter
but still Is there any other way so that BIFReader could get the data in their training data types because even after specifying the state_names
as (I'm taking the above example):
model = BayesianModel([('A', 'B'), ('C', 'B')])
model.fit(values, state_names={'A': list(set(values['A'])), # [0, 1]
'B': list(set(values['B'])),
'C': list(set(values['C']))})
given in the training data, data type of all these columns is int
. I have to typecast it while predicting.
About the new data, -1
is not there in the training corpus for column 'A'
and in the state_name I can't define it as {'A': [-1, 0, 1]} because then it throws an error ValueError: Data contains unexpected states for variable 'A'.
Therefore how to handle completely new data, I'm not sure.
@SrGrace I pushed an update to pgmpy yesterday (https://github.com/pgmpy/pgmpy/pull/1285) which now allows you to specify state_name_type
for BIFReader.get_model
. So, you can basically specify what type you want the read state names to be and it will automatically convert them.
About specifying the extra state, what version of pgmpy are you using? Because it works on my machine:
In [1]: import numpy as np
...: import pandas as pd
...: from pgmpy.models import BayesianModel
...: values = pd.DataFrame(np.random.randint(low=0, high=2, size=(100, 3)),
...: columns=['A', 'B', 'C'])
...:
...: model = BayesianModel([('A', 'B'), ('C', 'B')])
...: model.fit(values, state_names={'A':[-1, 0, 1], 'B': [0, 1], 'C': [0,1]})
In [2]:
In [2]: model.get_cpds()
Out[2]:
[<TabularCPD representing P(A:3) at 0x7fbfefbfc198>,
<TabularCPD representing P(B:2 | A:3, C:2) at 0x7fbff0542a20>,
<TabularCPD representing P(C:2) at 0x7fbf7893a400>]
In [3]: model.cpds[0]
Out[3]: <TabularCPD representing P(A:3) at 0x7fbfefbfc198>
In [4]: print(model.cpds[0])
+-------+------+
| A(-1) | 0 |
+-------+------+
| A(0) | 0.55 |
+-------+------+
| A(1) | 0.45 |
+-------+------+
How do I predict on a completely new data on which BayesianModel() hasn't been trained?
Example:
KeyError: -1