ngreifer / WeightIt

WeightIt: an R package for propensity score weighting
https://ngreifer.github.io/WeightIt/
101 stars 12 forks source link

Difficulty fitting weights to glm model #61

Open kkwi5241 opened 2 months ago

kkwi5241 commented 2 months ago

Hi there, Thank you for making this great package - I am an R novice keen on using IPTW to weight logistical regression analysis, and have made progress much more quickly than I thought I would with WeightIt.

I have had some difficulty fitting my weights into a glm, it works with just two confounding variables, but when I add a third I get the error below which I have not been able to solve:

Warning: (from glm()) simpleWarning: glm.fit: fitted probabilities numerically 0 or 1 occurred Error in Xtreat %*% Btreat : non-conformable arguments

weights <- weightit(binary_treatment ~ factor_confounder1 + factor_cofounder2 + numeric_confounder,
                    data = dataframe, estimand = "ATT", method = "glm")

weightit_model <- glm_weightit(binary_outcome ~ binary_treatment * (factor_confounder1 + factor_cofounder2 + numeric_confounder),
               data = data frame, family = "binomial", weightit = weights)

Thanks for any help you can give. BW, Charlie

ngreifer commented 2 months ago

That's a problem!

The first warning means that you have near-perfect separation in your data, i.e., a variable or combination thereof perfectly predicts treatment. You should use a different method, such as bias-reduced logistic regression (link = "br.logit") or energy balancing (method = "energy"), which are more robust to lack of overlap. What you are seeing is a problem with logistic regression as implemented in glm() and not WeightIt (that's why the error says "from glm()).

The second error, which is more problematic, is a bug in WeightIt, and I would love to have access to your dataset so I can properly diagnose it. If it's sensitive data, you can rename all the columns, recode the variables (e.g., multiplying the numeric variables by a constant and recoding the factor levels to have meaningless labels), and just include the variables used in the analysis (i.e., not the entire dataset). I was planning on submitting an update to WeightIt today so I want to make sure I squash this bug ASAP!

Thanks, and sorry for the confusion it might have caused.

kkwi5241 commented 2 months ago

Thanks very much for getting back to me so quickly, the warning does make sense given the data I am looking - I am a UK surgical academic working with a relatively small real patient dataset. Would it be possible to liaise over email? I am contactable at charles.west4@nhs.net BW, Charlie

ngreifer commented 2 months ago

Hi Charlie,

Thank you so much for your assistance over email. Here is the problem.

There is nothing wrong with the weighting model, and I was incorrect in assuming the problem was due to perfect separation of the treatment by the covariates. That said, you have fundamental imbalance in your covariates that cannot be rectified using weighting. Your groups are too small and there is not enough overlap between to make a valid inference on the ATT. One option is to change to a different estimand, like the ATO.

This issue was with your outcome model. There was indeed a bug in WeightIt, which I fixed. However, even with the bug fixed, it is not possible to fit your outcome model. That is because in the treated group, there is only a single event. Your sample is too small to make any valid inference on, especially after weighting. In an upcoming version of WeightIt, I have produced a cleaner error message that provides advice on how to diagnose the error. In this case, the error is subtle and can only be solved by collecting more data. These simply are not small-sample methods. I would recommend consulting with a medical statistician on how to extract some useful information from your sample. It might be that your analysis must be purely descriptive, as it is not big enough to appropriately adjust for confounding. I think a simpler, regression-based analysis specifically using methods designed for small samples (e.g., Firth logistic regression) could be effective. But I would advice you stay away from propensity score methods with this sample.

Noah