ngreifer / WeightIt

WeightIt: an R package for propensity score weighting
https://ngreifer.github.io/WeightIt/
102 stars 12 forks source link

Error message when using "missing="saem" #71

Closed zeynepbaskurt closed 5 days ago

zeynepbaskurt commented 1 week ago

Hi,

Thanks again for your continuing support on this amazing package.

Could you please help me with this error message? Can't we use "saem" in weightit()?

data("lalonde", package = "cobalt")

set.seed(1) age.NA=sample(1:nrow(lalonde),10,replace = FALSE) lalonde$age[age.NA]=NA

re74.NA=sample(1:nrow(lalonde),5,replace = FALSE) lalonde$re74[re74.NA]=NA

W1 <- weightit(treat ~ age + educ + married + nodegree + re74, data = lalonde, method = "glm", estimand = "ATE",missing="saem")

Error in names(std_obs) <- xnames[c(1, subsets + 1)] : attempt to set an attribute on NULL

Thanks! Zeynep

ngreifer commented 1 week ago

I'll look into this, thanks!

ngreifer commented 1 week ago

Hi Zeynep,

This is a bug in miseam, the package that performs the model fitting. To get around it, you can add the additional argument control = list(var_cal = TRUE) to the weightit() call, which will bypass the error. Sorry about that. The new version of WeightIt on GitHub has this fix implemented automatically so you can install it to avoid having to make that change yourself.

Noah

zeynepbaskurt commented 1 week ago

Hi Noah,

Thanks for your quick reply. It is working now with your solution.

A quick question: It seems like using missing="saem" or "ind" makes a big difference on my output regression (glm_weight) when I add this particular covariate which has 17 missingness (out of total sample size 162) to my PSformula. All the missingness is in the control cohort and this particular covariate is a continuous variable and is unbalanced at baseline. I have a feeling that since all the missingness is in control group, "ind" method does not work in full efficiency? e.g. when using "saem" in weightit() the pvalue for the treatment effect is ~0.001 (glm_weightit regression output) and when using "ind"; pval~0.06 for the treatment.

My question is, in general, which method would you suggest to use, ind or saem? Sorry I could not generate a reproducible example and I can not share my data.

Thank you, Zeynep

ngreifer commented 1 week ago

I would actually recommend neither and recommend using multiple imputation instead. If missingness is only in the control group, then if you target any estimand other than the ATT, you will be unable to balance the missingness indicator, and any observation with missingness will not contribute to the estimate appropriately (because it will always have a propensity score of 0). Using SAEM is similar to performing single imputation, but multiple imputation is usually a better option. SAEM in PS analysis hasn't been well studied, so I can't make a strong recommendation for it. Weighting with multiply imputed data is implemented in the MatchThem package, which provides a wrapper for weightit() for use with multiply imputed data.

I would say that with a sample that small, you will probably be better off using regression methods than PS methods. I encourage you not to rely too heavily on asymptotic results to justify any choices. Bootstrap if you can.

zeynepbaskurt commented 1 week ago

Thanks Noah, very helpful answer.

Yes, my sample size is not great. But PS approach fits so well on the purpose of our research. Speaking of bootstrap, I have also tried the following code, but got the warning messages below (several warnings in one run). I am positive that that is due to small cell count in the distribution of some categorical variables between treatment and control groups . When I tried vcov = "FWB", I did not get any warnings and the results are very similar to the regular glm_weight() with default vcov.

Would you kindly share your thoughts on this?

fit1.bs<-glm_weightit(Y~treatment+ offset(log(time)), data = dat1,family = poisson, weightit = W.ATO,vcov = "BS")

Warning: (from weightit()) (from misaem::miss.glm()) glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: (from weightit()) Propensity scores numerically equal to 0 or 1 were estimated, indicating perfect separation and infinite parameter estimates. These may yield problems with inference. Consider trying a different link. See help("method_glm", package = "WeightIt") for details.

ngreifer commented 1 week ago

Thanks for letting me know about this. missing = "saem" is not supposed to be compatible with vcov = "FWB", so I made that change in WeightIt. Essentially, the propensity score estimation ignores the bootstrapped weights, so the uncertainty is not correctly accounted for (and it's like you only bootstrapped the outcome model). When using vcov = "BS", the error you got is a real error, and there is nothing you can do about it. This is an issue with having too small a sample size and having perfect separation in some bootstrap resamples of the data. Using FWB is a good strategy to deal with this in normal circumstances, but it is not compatible with SAEM.

zeynepbaskurt commented 1 week ago

Thanks Noah.

It turned out I can avoid adding this problematic variable with 17 missing to the PS formula, as there are a few more baseline covariates which measure the same characteristics, that I can use instead, and they do not have missing data. I don't think I need to use saem or multiply imputed data anymore. With excluding this variable, I have now very few missingness, i.e. I lose only 5 samples in total if I use complete case analysis). Thus, I can now use vcov = "FWB" with complete case analysis with no warning message.

And for your information; If I use weightit(PSformula, data = dat1, method = "glm", estimand = "ATO",missing="ind") with the original data (so I have about 5 NAs in dat1 - not complete case analysis), using vcov = "FWB" in glm_weightit() still produced the following warnings (rightfully), that is, even "FWB" can not deal with perfect separation completely. The number of the warnings is few (e.g. 5 or 6), so in most of the boostrapped samples, that were extreme, FWB still did well (compared to BS).

Warning when using vcov = "FWB" on a data set with NAs, when missing="ind" is used in weightit(): Warning: (from weightit()) Propensity scores numerically equal to 0 or 1 were estimated, indicating perfect separation and infinite parameter estimates. These may yield problems with inference. Consider trying a different link. See help("method_glm", package = "WeightIt") for details.

Thanks again so much for sharing your knowledge and responding so quickly!

Zeynep