Closed gcasamat closed 3 years ago
Sorry, issues are closed automatically when the PR is merged. average_partial_effect.R (continuous treatment) is a generalization of the average_treatment_effect.R (binary treatment), which as you see does not require that same variance estimation, but still requires more than one cluster, @swager may have a better explanation?
Thanks for your reply. What you said is clear to me: with the current implementation of grf, it is not possible to compute the average partial effect in a single cluster. However, it seems that such a computation (in the binary treatment case) is done in the accompanying code of the article https://arxiv.org/abs/1902.07409: school scores are computed by using formula (8) in the paper.
You mean an ATE with only one cluster is computed here? (Sorry, I am not able to find it)
The histogram drawn below (taken from script.R) could not be interpreted as the distribution of ATEs per school? pdf("school_hist.pdf") pardef = par(mar = c(5, 4, 4, 2) + 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5) hist(school.score, xlab = "School Treatment Effect Estimate", main = "") dev.off()
To summarize, my concern is about the way the variance of W given X is estimated in average_partial_effect.R. Currently it is: variance_forest <- regression_forest(subset.X.orig, (subset.W.orig - subset.W.hat)^2, clusters = subset.clusters, num.trees = num.trees.for.variance ) Why couldn't we have: variance_forest <- regression_forest(X.orig, (W.orig - W.hat)^2, clusters = clusters, num.trees = num.trees.for.variance ), knowing that W.hat itself is estimated on the whole sample (not on the subsetted data) ?
When estimating the ATE with clusters, we imagine a sampling model where we draw random clusters and so we need at least 2 clusters to get a variance estimate; see, e.g., (8) in https://arxiv.org/pdf/1902.07409.pdf where we divide by J - 1 to estimate variance, where J is the number of clusters. (The difference with the setting of the histogram you brought up is that, there, we just need point estimates without variance estimates, and so using data from just one cluster is OK.)
The point you make in your last comment is a good one -- we should provide the user more flexibility in how they get weights. Ideally, the function average_partial_effect
should take an optional argument debiasing.weights
(either of length n
or of the same length as subset
), and if this argument is non-null then we skip training the variance.forest
. @erikcs could you please create an issue for this?
So you do not report the average partial effect for a single cluster because it is not possible to compute variance estimates (even though point estimates can be calculated). Right? More generally, the implications of clustering data when using causal forests are not entirely clear to me. Having read the various threads on the topic, it seems to be an active area of research and I believe that more insights are yet to come. Many thanks for your answer!
Closing this issue as it seemed resolved, note that after version 1.2.0 the average_partial_effect
is removed and replaced by a new unified interface: #723
I have clustered data with a continuous treatment and I would like to compute the average partial effect at the cluster level. This is not possible with the current implementation of the grf software because "with clustering enabled you treat each cluster as a distinct unit, which here would be the same as asking for the average partial effect for a single observation" (see #628). I suggested in this previous issue modifying the computation of V.hat in the script average_partial_effect.R. Having received no reply to this suggestion, I presume this is not a valid way to proceed. However I would be very interested to understand why. More generally, do you have any suggestion on the way to calculate the per cluster average partial effect ?