Why Shape of Y in Causal Forest notebook is 1000*1000

silulyu commented 3 months ago

I was running "Example Usage with Binary Treatment Synthetic Data" in the Causal Forest Notebook (https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Forest%20and%20Orthogonal%20Random%20Forest%20Examples.ipynb). After running the following code, I found the shape of Y is weird. It is a 10001000 matrix while 2nd to 999th columns are all the same. I believe that we should reshape the Y matrix and only use the first column for modeling, however, that will make ATE results totally different (0.97 vs 3.1). How should I understand the shape of Y. Should it be a 10001000 matrix or 1000*1 matrix? Thank you!

# DGP constants
np.random.seed(1234)
n = 1000
n_w = 30
support_size = 5
n_x = 1
# Outcome support
support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
coefs_Y = np.random.uniform(0, 1, size=support_size)
epsilon_sample = lambda n: np.random.uniform(-1, 1, size=n)
# Treatment support
support_T = support_Y
coefs_T = np.random.uniform(0, 1, size=support_size)
eta_sample = lambda n: np.random.uniform(-1, 1, size=n) 

# Generate controls, covariates, treatments and outcomes
W = np.random.normal(0, 1, size=(n, n_w))
X = np.random.uniform(0, 1, size=(n, n_x))
# Heterogeneous treatment effects
TE = np.array([exp_te(x_i) for x_i in X])
# Define treatment
log_odds = np.dot(W[:, support_T], coefs_T) + eta_sample(n)
T_sigmoid = 1/(1 + np.exp(-log_odds))
T = np.array([np.random.binomial(1, p) for p in T_sigmoid])
# Define the outcome
Y = TE * T + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)

# ORF parameters and test data
subsample_ratio = 0.4
X_test = np.array(list(product(np.arange(0, 1, 0.01), repeat=n_x)))

kbattocchi commented 3 months ago

I'm unable to reproduce this - I see (1000,) as the shape of Y. Is it possible that you've redefined exp_te to return something other than a scalar? What is the shape of TE?

silulyu commented 3 months ago

Thanks so much for your reply. That is a great catch! I add the function of exp_te in the Section 2 as the same function of that in Section 1, and the shape is (1000,) now.

py-why / EconML

Why Shape of Y in Causal Forest notebook is 1000*1000 #888