py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.88k stars 916 forks source link

How is propensity score matching implemented? #1196

Closed krz closed 4 weeks ago

krz commented 4 weeks ago

Hi,

I'm not sure how propensity score matching is implemented in dowhy. I assume the following:

  1. calculate propensity scores for the whole data set (using logistic regression by default)
  2. fit an unsupervised KNN estimator (with k=1) on the treated units based on the propensity score
  3. get index of and distances to the neighbors of each point for the control units

then for each control unit a 1:1 matched treated unit is found.

Is the following correct?

Thanks!

emrekiciman commented 4 weeks ago

Hi @krz , the propensity score matching can calculate either average treatment effects on treated, control, or total population (ATT, ATC, or ATE, respectively). For example, if we are calculating ATT, then we look for the nearest neighbor for each treated unit, but discard extra, unmatched control units. If we are calculating ATC, then we look for the nearest neighbor for each control unit, but discard extra, unmatched treatment units.

In either case, we use nearest neighbor identification with replacement. So, if we are looking for the nearest neighbor for a treated unit, we will consider matching with control units that were already matched to other treated units. I.e., the mapping is not 1:1.

krz commented 4 weeks ago

Thanks @emrekiciman for the clarification