How is propensity score matching implemented?

py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

MIT License

6.88k stars 916 forks source link

Hi,

I'm not sure how propensity score matching is implemented in dowhy. I assume the following:

calculate propensity scores for the whole data set (using logistic regression by default)
fit an unsupervised KNN estimator (with k=1) on the treated units based on the propensity score
get index of and distances to the neighbors of each point for the control units

then for each control unit a 1:1 matched treated unit is found.

Is the following correct?

If there are more treated units in the data set than controls, some treated units are discarded for this 1:1 match
If there are more control units in the data set than treated, some treatment units are duplicated to find a match for each control.

Thanks!

Hi @krz , the propensity score matching can calculate either average treatment effects on treated, control, or total population (ATT, ATC, or ATE, respectively). For example, if we are calculating ATT, then we look for the nearest neighbor for each treated unit, but discard extra, unmatched control units. If we are calculating ATC, then we look for the nearest neighbor for each control unit, but discard extra, unmatched treatment units.

In either case, we use nearest neighbor identification with replacement. So, if we are looking for the nearest neighbor for a treated unit, we will consider matching with control units that were already matched to other treated units. I.e., the mapping is not 1:1.

py-why / dowhy

How is propensity score matching implemented? #1196