grf-labs / grf

Generalized Random Forests
https://grf-labs.github.io/grf/
GNU General Public License v3.0
938 stars 250 forks source link

RATE with low treatment propensities --- target.sample="treated"? #1332

Open robert702 opened 10 months ago

robert702 commented 10 months ago

I am using causal_forest for an RCT were the treatment group has a very low treatment propensity: N control is 1 million, N treatment is 20,000.

When I calculate average_treatment_effect I get a warning that I should use the option target.sample="treated". This number is indeed much different from the overall average_treatment_effect (despite randomization) and it is also closer to what I get using OLS, which makes sense.

I now want to use RATE to evaluate the presence of heterogeneity. I wonder if I should be making any adjustment to account for the low treatment propensities. If there is no pre-loaded option, I could go to the source code myself, but any guidance on whether something like this is needed or not, would be greatly appreciated.

Thanks.

erikcs commented 10 months ago

Hi @robert702, that's an interesting question. Since the AUTOC can be represented as a weighted ATE ((8) in https://arxiv.org/pdf/2111.07966.pdf) I wonder if RATE + Crump et al. (2009)'s subsetting via estimated propensities is reasonable, what do you say @syadlowsky ?

You could estimate this with for example the following, computing the AUTOC for units with estimated propensities larger than 0.1:

rank_average_treatment_effect(evaluation.forest,
                              priorities,
                              subset = evaluation.forest$W.hat > 0.1)
robert702 commented 10 months ago

Thanks Erick,

I was thinking on something like calculating the TOC "manually" calculating ATEs using the function average_treatement_effect (target.sample="treated" ) in the test sample, over bins calculated using priorities taken from the training sample.

Specifically, the RATE source code has: // ATE <- sum(DR.scores.sorted sample.weights) / sample.weights.sum TOC <- cumsum(DR.scores.sorted sample.weights) / sample.weights.cumsum - ATE RATE <- wtd.mean(TOC, sample.weights) // This is from lines 440-442 here: https://github.com/grf-labs/grf/blob/master/r-package/grf/R/rank_average_treatment.R

For the TOC, I was thinking on taking the priorities from the original forest, split them into 100 groups. Then, running cumulative over the groups group, instead of taking the average of the scores, I can calculate average treatment effect on the treated sample with the correction in the average_treatment_effect function, using the option target.sample = "treated", or the "overlap" version.

In the aggregate, treatment effects with target.sample="treated" and target.sample="overlap" indeed give very similar results.

Does this seem like a reasonable approach to you? Or is there any conceptual missuderstanding?

Here is a rough script ---

rm(list = ls()) library(grf)

n <- 15000 p <- 5 X <- matrix(rnorm(n p), n, p) W <- rbinom(n, 1, 0.5) event.prob <- 1 / (1 + exp(2(pmax(2X[, 1], 0) W - X[, 2]))) Y <- rbinom(n, 1, event.prob) train <- sample(1:n, n / 2) cf.priority <- causal_forest(X[train, ], Y[train], W[train])

priority.cate <- 1 * predict(cf.priority, X[-train, ])$predictions

centile <- cut(priority.cate, breaks = quantile(priority.cate, probs = seq(0, 1, by = 0.01)), labels = FALSE)

summary(centile)

prioritygroup<- 101 - centile

cf.eval <- causal_forest(X[-train, ], Y[-train], W[-train])

ATE<- as.numeric(average_treatment_effect(cf.eval,target.sample = "treated")[1])

TOC <- numeric(100)

for (i in 1:100) { TOC[i]<-average_treatment_effect(cf.eval, subset=(prioritygroup<=i),target.sample = "treated") - ATE }

plot(TOC, type = "l", xlab = "Priority group", ylab = "ATE of priority group - ATE", main = "TOC")

erikcs commented 10 months ago

My immediate reaction would be to just do what's posted above, that's one of the reasons I added the subset argument to the rate function.

robert702 commented 10 months ago

Thanks Erick. I imagine that could work when there are enough observations with propensities above 0.1. As I was saying earlier, In my setting, the mass of propensities is at 0.02, so the simple subseting you proposed would not work.

I could just take a random sample of the control group to have a more balanced design, or use the suggestion described in the documentation in the average_treatment_effect function, as I described in my previous post: target_group("treated") or target_group("overlap"). Any thoughts on which of the two would be more appropriate? Or alternative approaches when there are basically no obaservations with propensities> 0.1?

Thanks in advance!!