py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.76k stars 713 forks source link

Why Shape of Y in Causal Forest notebook is 1000*1000 #888

Open silulyu opened 3 months ago

silulyu commented 3 months ago

I was running "Example Usage with Binary Treatment Synthetic Data" in the Causal Forest Notebook (https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Forest%20and%20Orthogonal%20Random%20Forest%20Examples.ipynb). After running the following code, I found the shape of Y is weird. It is a 10001000 matrix while 2nd to 999th columns are all the same. I believe that we should reshape the Y matrix and only use the first column for modeling, however, that will make ATE results totally different (0.97 vs 3.1). How should I understand the shape of Y. Should it be a 10001000 matrix or 1000*1 matrix? Thank you!

# DGP constants
np.random.seed(1234)
n = 1000
n_w = 30
support_size = 5
n_x = 1
# Outcome support
support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
coefs_Y = np.random.uniform(0, 1, size=support_size)
epsilon_sample = lambda n: np.random.uniform(-1, 1, size=n)
# Treatment support
support_T = support_Y
coefs_T = np.random.uniform(0, 1, size=support_size)
eta_sample = lambda n: np.random.uniform(-1, 1, size=n) 

# Generate controls, covariates, treatments and outcomes
W = np.random.normal(0, 1, size=(n, n_w))
X = np.random.uniform(0, 1, size=(n, n_x))
# Heterogeneous treatment effects
TE = np.array([exp_te(x_i) for x_i in X])
# Define treatment
log_odds = np.dot(W[:, support_T], coefs_T) + eta_sample(n)
T_sigmoid = 1/(1 + np.exp(-log_odds))
T = np.array([np.random.binomial(1, p) for p in T_sigmoid])
# Define the outcome
Y = TE * T + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)

# ORF parameters and test data
subsample_ratio = 0.4
X_test = np.array(list(product(np.arange(0, 1, 0.01), repeat=n_x)))
Screenshot 2024-06-05 at 12 46 03 PM
kbattocchi commented 3 months ago

I'm unable to reproduce this - I see (1000,) as the shape of Y. Is it possible that you've redefined exp_te to return something other than a scalar? What is the shape of TE?

silulyu commented 3 months ago

Thanks so much for your reply. That is a great catch! I add the function of exp_te in the Section 2 as the same function of that in Section 1, and the shape is (1000,) now.