py-why / dodiscover

[Experimental] Global causal discovery algorithms
https://www.pywhy.org/dodiscover/
MIT License
91 stars 18 forks source link

PC algo only working with int data inputs #69

Open robertness opened 1 year ago

robertness commented 1 year ago

Right now, the PC algorithm I believe requires discrete variables to be integers instead of characters. I tried running PC on this data:

A S T L B E X D
no yes no no yes no no yes
no yes no no no no no no
no no yes no no yes yes yes
no no no no yes no no yes
no no no no no no no yes

But it threw an error. To get it to work I had to convert the values to ints.

def convert_to_int(df):
    for var in df.columns:
        data[var] = [1 if x == "yes" else 0 for x in data[var]]
    return df
data_mod = convert_to_int(data)

pc.fit(data_mod, context)

Calling this a bug. pc.fit(data, context) should work.

adam2392 commented 1 year ago

Could the user just call an Encoder preprocessing function from scikit-learn? Or should we add that step for them? Either way good catch, we should document this accordingly for any categorical/discrete tests.