py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.78k stars 713 forks source link

Multidimensional treatment with XGBRegressior support #536

Open JakimPL opened 2 years ago

JakimPL commented 2 years ago

Is it possible to use CausalForestDML with a multidimensional treatment (5 continuous columns) with XGBRegressor? I see model_y accepts XGBRegressor, but model_t complains in a following way:

def _validate_meta_shape(data):
    if hasattr(data, 'shape'):
        assert len(data.shape) == 1 or (len(data.shape) == 2 and (data.shape[1] == 0 or data.shape[1] == 1))

during fit method. I'm using causal_model = CausalForestDML(model_t=XGBRegressor(), model_y=XGBRegressor(), random_state=0).

Do you know what raises the issue?

Greetings

kbattocchi commented 2 years ago

I don't believe that XGBoost natively supports multidimensional targets, so you won't be able to use XGBRegressor by itself as the T model if T has more than one column. However, you ought to be able to wrap it in a MultiOutputRegressor, which will just create an independent regressor for each column.

JakimPL commented 2 years ago

Thanks for your help.