uber / causalml

Uplift modeling and causal inference with machine learning algorithms
Other
5.02k stars 773 forks source link

Question: Propensity Score? #52

Closed BrianMiner closed 5 years ago

BrianMiner commented 5 years ago

Great library, just starting to take a look but I didnt immediately see documentation to answer the question I had in regards to if in the available methods, there is any distinction between observational or experimental data and specifically if propensity scores (propensity to be treated) were leveraged for observational data - perhaps as weights or through matching or as a covariate?

t-tte commented 5 years ago

Great question. The learners in the package assume that the treatment assignment is unconfounded conditional on the covariates, also sometimes known as the conditional ignorability assumption:

Screenshot 2019-09-14 at 07 34 29

With experimental data, this assumption is likely to be satisfied; with observational data, it's anyone's guess whether or not it is satisfied. While the models themselves don't differentiate between experimental and observational data, the required assumptions are much more difficult to satisfy in the observational case.

In particular, the package doesn't offer anything new in terms of making sure that you've blocked all relevant confounding paths between the treatment variable and the outcome. You still need to select the right covariates based on subject matter expertise.

As to the specific question about propensity scores, the X-Learner and R-Learner both use them for weighting.

BrianMiner commented 5 years ago

I am quite familiar with uplift and the various ways of modeling - but was / am unfamiliar with these meta learners and will need to read up on them. In short is it correct to say then that there is no adjustment for non-random treatment assignment in the package except for the X and R learners and these apply (im guessing) inverse propensity scores (for ATE or ATT?)?

t-tte commented 5 years ago

All of the meta-learners "adjust for" non-random treatment assignment if and only if you've included all of the relevant confounders in the matrix of features X, which a very difficult task. X/R-learners happen to use the propensity score but that is a separate issue.

BrianMiner commented 5 years ago

I agree with the first sentence, this is the idea of using regression adjustment in the basic traditional causal inference with observational data approach.

When I think of using a propensity score to attempt to estimate the causal effect of a treatment given confounding / selection bias I think of the traditional statistical means of doing this through matching (and then typically estimating the ATT) or through inverse propensity weights in a regression model for either ATE or ATT estimation. The use of frequency weights like this is what I thought might be happening in these meta learners -- but that is not the case is it? All the Meta learners (and the uplift random forests) only use regression adjustment through pre-treatment X features. The X learner appears to also use a propensity score but only as a weighting for the average between the two estimates shown in their paper - I am not sure if this actually addresses confounding at all (?) or is a method of ensembling the predictions. Am I right on all this?

BrianMiner commented 5 years ago

And also as such....we are always estimating ATE and not ATT with the functions in this library?

EDIT: Assuming we dont first match controls to the treated....

t-tte commented 5 years ago

That's right, the methods don't use propensity scores in the sense of matching or inverse probability of treatment weighting.

As you said, the X-Learner uses propensity score to combine estimations. In the R-Learner, the propensity score can be understood as removing regularization bias, as discussed by Chernozhukov et al.

In general, we are interested in estimating the conditional average treatment effect:

image

That's what you get with the predict() method. However, if you specifically use the estimate_ate() method, then you indeed get the ATE rather than ATT, ATC, etc.

BrianMiner commented 5 years ago

Makes sense, that we are always estimating the (conditional) ATE here since we are only applying regression control through X (not by weighting or matching). The CATE must become CATT when you first use matching though (and I see a nearest neighbor matching is implemented). Good stuff.