kosukeimai / MatchIt

R package MatchIt
205 stars 43 forks source link

Weights from Coarsened exact matching (CEM) appears to be wrong #155

Closed jiweihe1223 closed 1 year ago

jiweihe1223 commented 1 year ago

I examined the weights from match.data() after applying matchit cem. The control weights do not sum to 1 and some are even > 1.

    treat   weights subclass
35      0 2.2187500        1
87      0 2.2187500        1
10      1 1.0000000        1
39      0 0.6339286        2
91      0 0.6339286        2
192     0 0.6339286        2
212     0 0.6339286        2
229     0 0.6339286        2
302     0 0.6339286        2
385     0 0.6339286        2
2       1 1.0000000        2
ngreifer commented 1 year ago

What makes you think the control weights should sum to 1? The methodology for computing weights after CEM is explained here. See also #83.

jiweihe1223 commented 1 year ago

Thank you! Sorry, I forgot about the previous question. After getting control weights based on subclass-specific p/(1-p) for ATT weights, they should be further divided by the average weights in the control group. I was confused with the bal.tab() function. For this function, "For k:1 matching for the ATT, control units are weighted proportional to the inverse of the number of control units in their stratum. " Perhaps bal.tab() is only for getting a weighted mean or weighted SD within a treatment group. Then it is fine.

On Sun, Apr 2, 2023 at 7:21 PM Noah Greifer @.***> wrote:

What makes you think the control weights should sum to 1? The methodology for computing weights after CEM is explained here https://kosukeimai.github.io/MatchIt/reference/matchit.html#how-matching-weights-are-computed. See also #83 https://github.com/kosukeimai/MatchIt/issues/83.

— Reply to this email directly, view it on GitHub https://github.com/kosukeimai/MatchIt/issues/155#issuecomment-1493465393, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS733LHJNWCUYXSPDNFGEG3W7ICZDANCNFSM6AAAAAAWQSKJ4U . You are receiving this because you authored the thread.Message ID: @.***>

ngreifer commented 1 year ago

That sentence is still true but is not relevant to CEM, only to k:1 matching.

$$w0 = \frac{p}{1-p} = \frac{n{1s}/ns}{n{0s}/ns} = \frac{n{1s}}{n{0s}}$$ where $n{1s}$ is the number of treated units in subclass $s$, $n{0s}$ is the number of control units in subclass $s$, and $n{s}$ is the total number of units in subclass $s$. In k:1 matching, the number of treated units in each subclass is 1, so $n{1s} = 1$ for all $s$, so the control weights are equal to $1/n{0s}$, that is, (proportional to) the inverse of the number of control units in the stratum. So the formula in the MatchIt documentation works there too.

jiweihe1223 commented 1 year ago

Thanks. I guess I was also confused with IPTW weights where we can use p/(1-p) for ATT control weights directly for 1: 1 pseudopopulation. But for matching, we still need to divide p/(1-p) by average weights in the control group to maintain the original number of subjects in each treatment group.

ngreifer commented 1 year ago

You don't need to do that. That was a convention in older versions of MatchIt that I retained for historical reasons. Simply using the subclass propensity score weights without standardizing them to sum to the control group sample size will yield the same balance statistics and difference in means, and, if the treatment is interacted with all covariates in the outcome model as recommended in the vignette on estimating effects, the covariate-adjusted effect estimate will be the same. So there is no reason to do this. In future versions of MatchIt, I may remove this step to make the weights more transparent.

jiweihe1223 commented 1 year ago

Thanks. Just want to mention that dividing weights by a constant in each treatment group does not make any difference for linear regression using lm(). However, for logistic regression and perhaps Cox model, multiply/divide weights by a constant matters, unless a sandwich estimator is used. But for treatment effect after matching, we always use a sandwich estimator. Perhaps it won't be a problem.

On Sun, Apr 2, 2023 at 11:31 PM Noah Greifer @.***> wrote:

You don't need to do that. That was a convention in older versions of MatchIt that I retained for historical reasons. Simply using the subclass propensity score weights without standardizing them to sum to the control group sample size will yield the same balance statistics and difference in means, and, if the treatment is interacted with all covariates in the outcome model as recommended in the vignette on estimating effects, the covariate-adjusted effect estimate will be the same. So there is no reason to do this. In future versions of MatchIt, I may remove this step to make the weights more transparent.

— Reply to this email directly, view it on GitHub https://github.com/kosukeimai/MatchIt/issues/155#issuecomment-1493593666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS733LBJKJUSH3EIMD7QZWLW7JABFANCNFSM6AAAAAAWQSKJ4U . You are receiving this because you authored the thread.Message ID: @.***>

ngreifer commented 1 year ago

You must use a sandwich estimator for the variance. It is the only valid estimator. If you estimate treatment effects using the instructions in the vignette, no matter which model you use, the estimates and standard errors will not be affected by multiplying the weights in either treatment group by a constant. The estimates and standard errors will be affected if there are covariates in the model and you don't include all treatment-covariate interactions or if you use the wrong standard error.

You're right that Cox models are affected, though. That's one of many good reasons to avoid them or to only use them with matching methods that have a simple formula for the weights (e.g., fixed k:1 matching).

jiweihe1223 commented 1 year ago

I see. Thanks. In that case, perhaps it is safer to use this step (although it makes the weights look less intuitive). It seems including treatment by covariate interaction in the model after matching is not common.

On Mon, Apr 3, 2023 at 2:59 AM Noah Greifer @.***> wrote:

You must use a sandwich estimator for the variance. It is the only valid estimator. If you estimate treatment effects using the instructions in the vignette, no matter which model you use, the estimates and standard errors will not be affected by multiplying the weights in either treatment group by a constant. The estimates and standard errors will be affected if there are covariates in the model and you don't include all treatment-covariate interactions or if you use the wrong standard error.

You're right that Cox models are affected, though. That's one of many good reasons to avoid them or to only use them with matching methods that have a simple formula for the weights (e.g., fixed k:1 matching).

— Reply to this email directly, view it on GitHub https://github.com/kosukeimai/MatchIt/issues/155#issuecomment-1493783127, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS733LFMEOCJLP2UQTHXCOTW7JYOXANCNFSM6AAAAAAWQSKJ4U . You are receiving this because you authored the thread.Message ID: @.***>

jiweihe1223 commented 1 year ago

Actually even with covariate adjustment in the outcome model (without treatment by covariate interactions), dividing the weights by a constant within each treatment group does not seem to make any difference to the variance estimator. But I only tried linear model with continuous outcome. To be consistent with IPTW, perhaps omitting the last step is more intuitive.

On Mon, Apr 3, 2023 at 1:23 PM Jiwei He @.***> wrote:

I see. Thanks. In that case, perhaps it is safer to use this step (although it makes the weights look less intuitive). It seems including treatment by covariate interaction in the model after matching is not common.

On Mon, Apr 3, 2023 at 2:59 AM Noah Greifer @.***> wrote:

You must use a sandwich estimator for the variance. It is the only valid estimator. If you estimate treatment effects using the instructions in the vignette, no matter which model you use, the estimates and standard errors will not be affected by multiplying the weights in either treatment group by a constant. The estimates and standard errors will be affected if there are covariates in the model and you don't include all treatment-covariate interactions or if you use the wrong standard error.

You're right that Cox models are affected, though. That's one of many good reasons to avoid them or to only use them with matching methods that have a simple formula for the weights (e.g., fixed k:1 matching).

— Reply to this email directly, view it on GitHub https://github.com/kosukeimai/MatchIt/issues/155#issuecomment-1493783127, or unsubscribe https://github.com/notifications/unsubscribe-auth/AS733LFMEOCJLP2UQTHXCOTW7JYOXANCNFSM6AAAAAAWQSKJ4U . You are receiving this because you authored the thread.Message ID: @.***>