Right now, the PC algorithm I believe requires discrete variables to be integers instead of characters.
I tried running PC on this data:
A
S
T
L
B
E
X
D
no
yes
no
no
yes
no
no
yes
no
yes
no
no
no
no
no
no
no
no
yes
no
no
yes
yes
yes
no
no
no
no
yes
no
no
yes
no
no
no
no
no
no
no
yes
But it threw an error. To get it to work I had to convert the values to ints.
def convert_to_int(df):
for var in df.columns:
data[var] = [1 if x == "yes" else 0 for x in data[var]]
return df
data_mod = convert_to_int(data)
pc.fit(data_mod, context)
Calling this a bug. pc.fit(data, context) should work.
Could the user just call an Encoder preprocessing function from scikit-learn? Or should we add that step for them? Either way good catch, we should document this accordingly for any categorical/discrete tests.
Right now, the PC algorithm I believe requires discrete variables to be integers instead of characters. I tried running PC on this data:
But it threw an error. To get it to work I had to convert the values to ints.
Calling this a bug.
pc.fit(data, context)
should work.