py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.68k stars 694 forks source link

ValueError: make_column_selector can only be applied to pandas dataframes #540

Open yfhewei opened 2 years ago

yfhewei commented 2 years ago

when call ForestDRLearner, set model_regression=make_pipeline(ordinal_encoder,HistGradientBoostingClassifier()), set ordinal_encoder=make_column_transformer( (OrdinalEncoder(), make_column_selector()), remainder='passthrough')

ps.omitted some unimportant parameters.

then report error: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

ValueError Traceback (most recent call last)

in 2 est = ForestDRLearner(model_regression=clfY, 3 model_propensity=clf) ----> 4 est.fit(y, T=T, X=X) /conda/envs/notebook/lib/python3.6/site-packages/econml/dr/_drlearner.py in fit(self, Y, T, X, W, sample_weight, groups, cache_values, inference) ...... ...... /conda/envs/notebook/lib/python3.6/site-packages/sklearn/compose/_column_transformer.py in __call__(self, df) 825 """ 826 if not hasattr(df, 'iloc'): --> 827 raise ValueError("make_column_selector can only be applied to " 828 "pandas dataframes") 829 df_row = df.iloc[:1] ValueError: make_column_selector can only be applied to pandas dataframes I guess econML use some not dataframe to call fit. am I right? does this need fix?
yfhewei commented 2 years ago

in one step ,there is a call " np.hstack" for W,X, change the data frame into array. and the latter make_column_selector require dataframe, not array. this makes the bug.

yfhewei commented 2 years ago

Suggest to combine the X and W using pd.concat, not np.hstack.

yfhewei commented 2 years ago

if direct pass estimator to the fit() of models in econml , then its OK. but if you pass a pipeline with transformer to the fit() function, then it will wrong.

dawkrish commented 3 weeks ago

in one step ,there is a call " np.hstack" for W,X, change the data frame into array. and the latter make_column_selector require dataframe, not array. this makes the bug.

but what is the fix of this then? to re-convert ndarray to dataframe?