r-causal / causal-inference-in-R

Causal Inference in R: A book!
https://www.r-causal.org/
196 stars 51 forks source link

Evaluating propensity scores for balancing the joint distribution #288

Open ehudkr opened 3 days ago

ehudkr commented 3 days ago

Hi everyone, thanks for the book. I only skimmed bits and pieces but it reads well and is skillfully presented. Chapter 9 on propensity score evaluation is great. The thematic progression from sample-mean to full distribution and from linear to non-linear modeling is important and often overlooked.

I have a suggestion to progress it even further, which I'll do my best to describe briefly.

We first start with the observation/motivation that the SMD only evaluates covariates marginally, and there could be pathologies where two covariates are well-balanced separately, but their product is not. Somewhat similar to the case below (from here)

image

To account for that, we first apply the existing solution - using the SMD for the interaction of x_1:x_2. However, examining all possible pairs (not to mention triplets etc.) can be impractical.

Then, the proposed solution is post-adjustment two-sample discrimination tests. Briefly, if your propensity scores do well in balancing the covariate distribution between groups, making the groups indistinguishable, then you shouldn't be able to predict the treatment assignment from the (weighted) covariates. If you would use a statistical classifier to separate the two groups then your accuracy should be as good as random. And the longer you can keep it up for more flexible discriminators (that dive into the joint distribution of the data, like random forests, additive trees, etc.) the more trustworthy your propensity scores are in balancing the joint distribution of covariates between groups. In practice, there's no need to actually fit an additional model, it can be enough to simply calculate post-adjustment discrimination metrics like the (area under the) ROC curve and the like, weighted by the inverse propensity scores or on the matched sample. All very doable in R.

That's the gist very briefly, I hope that's clear enough (there are more details, though still relatively high-level, here)

If you think that's in the scope of your chapter and not too off the background of your readers (which I think is not the case given your table of content), then I think that could be a great addition, making the chapter more complete while progressing on the same arc of the existing story. I know Malcolm prefers issues rather than PRs here, but I can try to sketch a draft in Quarto if needed.

Best,

malcolmbarrett commented 3 days ago

@LucyMcGowan what do you think about this idea? I haven't done something like this but I think I like the logic of it