Management of binary outcome and missing data using interaction terms in glm_weightit

kgkirgkiris commented 1 week ago

Hi Noah,

I want to start by expressing my sincere thanks, not only for this incredible package but also for everything you have done to make propensity score weighting and matching both accessible and easy to interpret. I have come across countless answers from you on StackExchange and GitHub, and I have learned so much from them. Your contribution has been invaluable. I am sure many others feel the same way. Thank you.

My questions concern estimating effects after weighting. I have a continuous treatment variable and several covariates for a binary outcome. May the proposed algorithm for fitting the outcome model:

fit <- lm_weightit(Y_C ~ splines::ns(Ac, df = 4) * (X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = d, weightit = W)

be modified as follows in order to convert the binary outcome (Y_B) into continuous as "predicted probabilities of outcome"?

fit <- glm_weightit(Y_B ~ splines::ns(Ac, df = 4) * (X1+ X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9), data = d, weightit = W, family = binomial)

Is the abovementioned modifications enough to continue with the rest of the analysis or my approach is wrong?

An error I also face when using the default "ind" way of dealing with missing data is that when I include the interaction term in my fitting mode, I get this error:

Warning: (from glm()) glm.fit: fitted probabilities numerically 0 or 1 occurred Error in cbind(psi_out(Bout, w, Y, Xout, SW, offset), psi_treat(Btreat, : number of rows of matrices must match (see arg 2)

This is also the case when i use a binary treatment variable. It seems that this error does not occur when i remove the interaction term along with the covariates.

Thank you in advance for your time and support. I am genuinely looking forward to your response. I would also like to apologize if any of my questions come across as overly basic or elementary.

Kind regards, Kostas

ngreifer commented 1 week ago

Hi Kostas,

Thank you so much for the kind words about my packages and writing! I'm glad they have been helpful.

Your modification for the binary outcome is correct. Note that your confidence intervals might be outside of [0, 1]; there are ways to prevent this but they are a bit involved, so let me know if that's an issue for you.

Unfortunately, I have not thoroughly tested the performance of glm_weightit() with missing data. Because it calls glm(), it just deletes any missing data, which causes the problems you observed. You should not include any covariate with missingness in the outcome model. Even if that covariate is not part of the interaction, it will still cause your observations to be dropped, which may not be apparent in the output.

Noah

kgkirgkiris commented 1 week ago

Thank you very much for your kind and prompt response, and for your helpful insights.

Regarding the confidence intervals, your guidance on how to prevent them from falling outside the [0,1] range would be really helpful, especially since my dataset contains small percentages. I would appreciate any advice or methods you could share for addressing this issue.

Kostas

ngreifer commented 1 week ago

The code will look a bit esoteric, but here is how you would do it:

p <- avg_predictions(fit,
                     variables = list(Ac = values),
                     byfun = function(...) qnorm(mean(...)),
                     transform = pnorm)

What this does is first put the average predicted probabilities on an unbounded scale, on which standard errors and confidence intervals are estimated, and then transforms the estimates and confidence intervals back to the probability scale. You can replace qnorm() and pnorm() with qlogis() and plogis(), respectively. This would be a bit foreign to some audiences but it does have the nice feature of ensuring the confidence intervals are bounded. They are symmetric around the estimates on the unbounded scale rather than on the probability scale. Otherwise the estimates should be identical and the confidence intervals have the usual interpretation.

ngreifer / WeightIt

Management of binary outcome and missing data using interaction terms in glm_weightit #74