grf-labs / grf

Generalized Random Forests
https://grf-labs.github.io/grf/
GNU General Public License v3.0
957 stars 248 forks source link

rank_Average_treatment_effect RATE with continuous treatment #1194

Open GalAmedi opened 2 years ago

GalAmedi commented 2 years ago

Hello.

I am trying to test the fit of a causal forest I trained in a setting with a continuous treatment. I get the following error message:

"rank_average_treatment_effect only supports binary treatment."

From my understanding of the test, this restriction seems odd. Both the definition of prioritization rules and the caluclation of AIPW scroes are possible with continuous treatments. Is there any reason that I'm missing for this restriction? if not, will it be possible to allow for non-binary treatments in future releases?

Thanks for developing this great method and package.

Gal

syadlowsky commented 2 years ago

Hi Gal,

The issue is not related to the definition of the score or the estimation, but an issue with defining what a RATE should be for multiple / continuous treatments. The nuances of the issue are easier to discuss in the context of multiple discrete treatments, so I'll start there. Currently, a RATE is conceptually like a generalized correlation between the true treatment effect between one treatment vs another and the rank ordering of the estimated treatment effect of such. To extend to multiple treatments, one could look at the RATE for all pairs of treatments, or designate one treatment as a "baseline" (ie, control), and look at the RATE for all the treatments relative to the baseline. However, it's not clear that either of these would correspond well to the practical usage of a treatment effect estimator, where the goal is to pick a specific treatment with which to intervene.

If you have ideas for how to extend the definition of the RATE to multiple treatments (and continuous treatments) in a meaningful way, we would be very interested in discussing.

Best, Steve

GalAmedi commented 2 years ago

Hey Steve, I'll try to give it a shot. Would be happy to hear your feedback, maybe I have a wrong perception of the the entire issue.

In my specific context I have a single type of treatment with varying intensity. If we would like to translate it the Criteo Uplift Benchmark terms which appears in section 6 of the RATE paper we could think of a continuous treatment as one indicating the number of ads the individual was exposed to. The treatment effect I'm interested in is the average partial effect:

$\tau = E\left[\frac{Cov[W, Y | X]}{Var[W | X]}\right]$

That is, I am willing to assume that the marginal effect of any additional ad on the probability the individual will visit the site is constant regardless of the number of ads the individual was already exposed to. More simply put, for any intensity level of treatment $b$ we assume:

$Y_i(b) = Y_i(0) + \tau*b$

This assumption might be silly in this specific context, but is more reasonable and commonly used in other contexts.

In grf I can estimate this effect using AIPW with the average_treatment_effect function. I can also estimate a parallel idiosyncratic CATE using a standard Causal Forest, and base my priority scoring function $S(.)$ on it (with proper cross-fitting). Now I can write an alternative definition to TOC equivalent to eq. (2) in the paper:

$TOC(u; S) = E\left[\frac{Cov[W, Y | X]}{Var[W | X]} | F_S(S(X_i)) ≥ 1 − u \right]-E \left[\frac{Cov[W, Y | X]}{Var[W | X]}\right]$

I would like to construct this TOC and sum it using AUTOC, QINI or any other definition. In my case, the motivation is to test the calibration of the Causal Forest I trained. I would like to have additional support for the claim that I recognize real heterogeneity, on top of the Omnibus Calibration test I already apply. The interpretation of the RATE (in my understanding) is similar to the interpretation in the binary treatment case.

In other contexts a policy decision rule interpretation is also appropriate. Consider for example a context where one examines effect of taxes on labor inputs. Different individuals reveiced tax benefeits of different size, and we would like to understand which populations' labor inputs are very responsive to taxes making them candidates for additional benefeits. The data is observational and the researcher identifies some quasi-random variation in treatment. We are willing to assume the effect is constant in any level of benefeits we observe in our data. In this case the RATE's for different decision rules would be a statistic indicating the extent to which any decision rule we consider targets individuals whose labor input actually is more responsive than the sample average to tax benefits.

My suggestion trivially applies to multiple discrete intensities of a given treatment (e.g, tax benefeits of 0$, 100$ or 200$). The case of treatments who differ by other aspects is different and requires more thought, but is a different case than the one described here.

I went over the RATE paper and I don't see where this can cause estimation or inference issues, but would be glad for feedback if I'm missing anything.

Gal

GalAmedi commented 2 years ago

Hey @syadlowsky, would be glad to hear any thoughts you have on the issue.

Best, Gal