py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.82k stars 715 forks source link

ForestDRLearner : outcome binary and treatement is discret ( 3 values) #908

Open Xela06-mjt opened 3 months ago

Xela06-mjt commented 3 months ago

i'm building model with ForestDRLearner . I would to have the treatment which minimizes the outcome and in the end to have client, best_treatment 1, 0 2, 1 3, 2 4, 0 ect ...

how make this final dataset with this code ? what is the best solution ? this code is not quite what I need

X = sampling.drop(columns=['T', 'Y']) Y = sampling['Y'] T = sampling['T']

X_train, X_test, T_train, T_test, Y_train, Y_test = train_test_split(X, T, Y, test_size=0.2, random_state=123)

model = ForestDRLearner( model_propensity=XGBClassifier(learning_rate=0.1, max_depth=3, objective="multi:softprob"), model_regression=XGBClassifier(learning_rate=0.1, max_depth=3, objective="binary:logistic"), discrete_outcome=True, random_state=1, )

model.fit(Y=Y_train, T=T_train, X=X_train, inference="auto")

cate_estimates = model.effect(X_test) cate_estimates

best_treatment = np.argmin(cate_estimates, axis=1)

results = pd.DataFrame({

'best_treatment': best_treatment

})

kbattocchi commented 3 months ago

It's not clear from your description what's not working for you.

One thing to note is that all treatment effects are relative to the 'control' treatment, so really you should append a column of zeros to the effects before taking the argmin (because if each other treatment is negative relative to the control, then you should pick the control even though its relative effect compared to itself is 0).

Xela06-mjt commented 3 months ago

thank you for your answer, I discovered econml a short time ago, and I am not yet very expert. my problem is that I am not sure of the code that I have to write to answer my problem, I am open to other proposals. In the meantime, I told myself that ForestDRLearner was a good solution to my problem. use. I have my binary outcome and my processing is discrete (it takes 3 values). using the cate, I would like to find what is the best treatment for each client. I started with this code. Maybe this is not the right way to do it? my question is : how to know what is the best treatment?
client | best_treatement client1 | 2 client2 | 1 client3 | 0 client4 | 2 etc.. i add this code cate_estimates_with_control = np.hstack([np.zeros((cate_estimates.shape[0], 1)), cate_estimates])
I don't know if this matches your suggestion.