py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.85k stars 718 forks source link

ufunc 'isnan' not supported for the input types in DML "effect()" function #745

Open jaydeepchakraborty opened 1 year ago

jaydeepchakraborty commented 1 year ago

Thank you for the package and such huge effort. I am trying to do below estimation,

variables are, features: ['X1', 'X2', 'X3', 'X4', 'X5'], output: ['Y'], treatment: ['T_1', 'T_2'] Here, Type is categorical and values are (0, 1, 2)

test_seg = dml_test_X.iloc[[2,4]] # third and fifth rows print(test_seg) dml_est.effect(test_seg, T0=0, T1=1)

   X1      X2      X3           X4       X5

6 27 1 77.99 4.193 131.126667 10 60 1 76.65 3.717 223.173417

X1- continuous X2- categorical X3- continuous X4- continuous X5- continuous

treatment- categorical Y- continuous

ERROR: TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Possible reason: X2 and Treatment are categorical (pandas - dtype('o')) and np.isnan() throws error for "category" data type.

Possible Solution: replace np.isnan with pd.isna, which supports category dtypes?

image

kbattocchi commented 1 year ago

If you can provide a fully self-contained repro, that would help. However, internally we're using sklearn's OneHotEncoder to transform the treatment when it is discrete, and I suspect that this failure is a known issue with how that class interacts with pandas.

jaydeepchakraborty commented 1 year ago

@kbattocchi Thank you for replying.

I have this notebook, hope this helps. Please let me know if you need any information.

https://github.com/jaydeepchakraborty/NLP/blob/36dc367c2d84d39830a253e8c0e9629ca997e882/CI_test.ipynb

If we convert the object type column to int type. then we are able to run.