ngreifer / WeightIt

WeightIt: an R package for propensity score weighting
https://ngreifer.github.io/WeightIt/
101 stars 12 forks source link

a possible problem when offset is added? #63

Open zeynepbaskurt opened 1 month ago

zeynepbaskurt commented 1 month ago

Hi,

Firstly, I wanted to thank yo for providing such an amazing package!

I am using WeightIt package for an observational study, where I have unbalanced baseline characteristics between treatments (0 and 1). My outcome is a count variable, Y, which indicates the number of a specific condition/disease the person has had during a certain time. My final model would be a glm() with Poisson family where I will add an offset term, e.g. Y ~ X1+X2+offset(log(time)). I wanted to use glm_weightit() function however, I am getting very interesting results; in particular, the standard errors of the coefficients in the final model are quite big.

I generated a random data set below, where you can see what happens when we add offset() term. I wonder if I am making a typo while adding the offset term or whether this is a bug? I did not include the offset to the propensity score model, as it should not have an effect on the treatment allocation. Could the problem be that the weights are calculated without that term, but in the final model we are adding it to the final model? Or could it be that there is over dispersion in the Poisson model (I have over dispersion in my data as well, I was planning to use quasipoisson for my final model, but I do not know how to correct for dispersion in glm_weightit).

Also, as a related question: Can I use weightit()'s weights (e.g. ATE, ATO or other options) in the base R glm function; glm(..., weights=weightit$weights)? I know the variance estimates are more robust in glm_weightit(), but I wanted to be more flexible in modelling my final fit, e.g. quasipoission, negativebinomial, etc.

Thank you so much in advance,

Zeynep

set.seed(23) n=150 treatment = rbinom(n,1,0.6) # treatment=0, 1 X1=X2=X3=X4=NULL

Creating mildly unbalanced X1 and X2 between treatment groups

for (i in 1:n) { X1[i]=ifelse(treatment[i]==1,rnorm(1,4,3),rnorm(1,6,1))

X2[i]=ifelse(treatment[i]==1,rpois(1,7),rpois(1,6))

}

X3=rbinom(n,1,0.5) X4=rbinom(n,1,0.5)

Y=rpois(n,40+6*treatment+rnorm(n))

dat <- data.frame(Y=Y,treatment=treatment,X1=X1,X2=X2,X3=X3,X4=X4,time=log(rpois(n,50)))

summary(glm(Y~treatment+X1+X2+offset(time), data = dat,family = poisson)) summary(glm(Y~treatment+X1+X2, data = dat,family = poisson))

mod1=glm(Y~treatment+X1+X2+offset(time), data = dat,family = poisson) dispersiontest(mod1) ## over dispersion detected

weightit

formula1 <- treatment ~ X1+X2+X3+X4 fit1 <- glm(formula1, family = binomial(link="logit"), data = dat)

wtest <- weightit(formula1, data=dat,method="glm",estimand = "ATE") bal.tab(wtest)

with offset term

fit2<-glm_weightit(Y~treatment+X1+X2,offset=time, data = dat,family = poisson, weightit = wtest) summary(fit2)

summary(glm(Y~treatment+X1+X2+offset(time), data = dat,family = poisson,weights=wtest$weights))

without offset term

fit3<-glm_weightit(Y~treatment+X1+X2, data = dat,family = poisson, weightit = wtest) summary(fit3)

ngreifer commented 1 month ago

Thanks for letting me know about this. This is indeed a bug in WeightIt due to using the offset. You have a few options:

To use glm(), you have to use robust standard errors when including weights, no matter what. The output of summary() when using glm() with weights is totally inaccurate; it's not a question of being more robust or not. To get robust standard errors for the model coefficients, you need to run

lmtest::coeftest(mod1, vcov = sandwich::vcovHC)

If you are using marginaleffects to perform g-computation as recommended in the vignette on estimating effects, you need to add vcov = "HC3" in the call to avg_comparisons(), etc.

Note that you don't need to test for overdispersion; robust standard errors do not rely on estimates of the dispersion parameter and are fully robust to over- or under-dispersion (and so it doesn't matter if you use family = poisson or family = quasipoisson; they should give you identical estimates and robust standard errors). Similarly, you should never use negative binomial regression if you are fitting a model with a log link; just use Poisson.

In the meantime, I'll work on fixing the bug so that inclusion of the offset yields correct standard errors. By the way, the issue could be found without estimating weights at all: simply using glm_weightit() and glm() with HC0 robust standard errors (which can be requested using lmtest::coeftest(mod1, vcov = sandwich::sandwich)) should be expected to yield identical standard errors with or without weights, but they don't when an offset is included in the model. A final thought is that you should always use log(time) when using time as an offset in Poisson models (or any model with a log link).

zeynepbaskurt commented 1 month ago

Thank you so much for this great answer!! I will look into your suggestions.