kosukeimai / MatchIt

R package MatchIt
210 stars 41 forks source link

Relation between Covariates for Matching and Variables for Treatment Effect Estimation #179

Closed Maxi54321 closed 11 months ago

Maxi54321 commented 1 year ago

Hello, when reading documentations about Propensity Score Matching, mostly only one Regression formula is observed, wherein Matching, etc. is conducted.

In my case, I have four different Hypotheses/Regressions with four different dependent and independent variables, but still with the same underlying dataset for all.

Do I understand it right, that the assessment of the initial balance and balance of certain covariates (and the succeeding Propensit Score calculation) with the function "matchit" can be done once for all? And with this calculated scores, which are stored in the matchit-object as "distance" estimate the treatment effect, but then of course with only the distinctly relevant dependent and independent variables?

Here maybe an example: Two Hypotheses/Regression to be observed: H1: Influence of income on health. H2: Influence of gender on time spent with kids. And some Treatment is conducted in the background with Treat = 0/1. --> So would it be possible to calculate the Propensity Scores like this?: Treat ~ age + educ + gender + income + health + time_spent_with_kids + ..... --> And work with the resulting Propensity Scores for each Participant as usual: H1: health ~ income (but with the distances of the resuting match.data(m.out)) H2: time_spent_with_kids ~ gender (but with the distances of the resuting match.data(m.out))

I hope my question is clear. In common literature there is mostly just one regression to run and covariates for matching are the same as the ones used for treatment effect estimation. I hope in my case it is possible to calculate propensity scores once.

Thank you in advance! :)

ngreifer commented 1 year ago

The purpose of matching is to eliminate confounding between the treatment and the outcome. Matching doesn't make any other relationship unconfounded. It is inappropriate to test any other hypothesis besides the effect of the treatment on the outcome. Your two hypotheses don't even involve the treatment variable, so why are you matching in the first place? Interpreting the regression coefficients for any variables other than the treatment is called the Table 2 fallacy and must be avoided at all costs. You need to think of which analysis can best be used to estimate each relationship you want to estimate, and you need to be specific in which kinds of relationships you want to examine.

For example, for the relationship between gender and time spent with kids, are you interested in the marginal unadjusted relationship (which could be examined by a t-test), or the disparity adjusting for other variables? You can induce collider bias by adjusting for some variables, so you need to be careful about which you adjust for.

I urge you to find a statistical consultant or collaborator to work with; the MatchIt issues page is for helping with using MatchIt, not the general task of causal inference.