pat-s / oddsratio

Simplified odds ratio calculation of binomial GAM/GLM models
https://pat-s.github.io/oddsratio
Other
32 stars 2 forks source link

Issue with or_gam #48

Open rpouillot opened 3 years ago

rpouillot commented 3 years ago

Hi,

I had to estimate some odds-ratios from gam models and found your package. After some tries, I found that the confidence intervals were unexpectedly small. I checked the code of your or_gam function and found that the standard errors (se) of the odds-ratios (in the "link" scale) were estimated as the difference between the se of the estimates (since pred_gam2_ci_low <- pred_gam1[1] - (2 * pred_gam1[2]) - pred_gam2[1] - (2 * pred_gam2[2]) which is equivalent to pred_gam2_ci_low <- (pred_gam1[1] - pred_gam2[1]) - (2 * (pred_gam1[2] - pred_gam2[2])) ) and I think it is not statistically sound. The variance of the difference in the estimates is actually equal to the sum of the variances minus twice their covariances.

I can illustrates the issue as following.

library(oddsratio)
library(mgcv)
library(MASS)

#####################################################
# CHECK GLM

fit_glm <- glm(admit ~ gre + gpa + rank,
               data = data_glm,
               family = "binomial"
) # fit model

or_glm(data = data_glm, model = fit_glm, incr = list(gre = 380, gpa = 4))

#  predictor oddsratio ci_low (2.5) ci_high (97.5)          increment
#1       gre     2.364        1.054          5.396                380
#2       gpa    24.932        1.899        349.524                  4
#3     rank2     0.509        0.272          0.945 Indicator variable
#4     rank3     0.262        0.132          0.512 Indicator variable
#5     rank4     0.212        0.091          0.471 Indicator variable

# MASS equivalence
data_glm$greScaled <- data_glm$gre / 380  
data_glm$gpaScaled <- data_glm$gpa  / 4  

fit_glmScaled <- glm(admit ~ greScaled + gpaScaled + rank,
               data = data_glm,
               family = "binomial"
) # fit model Scaled

# USING MASS function: we got the same results
exp(cbind(coef(fit_glmScaled), confint(fit_glmScaled)))

#Waiting for profiling to be done...
#                             2.5 %      97.5 %
#(Intercept)  0.0185001 0.001889165   0.1665354
#greScaled    2.3642995 1.053676012   5.3958610
#gpaScaled   24.9319521 1.898727216 349.5235388
#rank2        0.5089310 0.272289674   0.9448343
#rank3        0.2617923 0.131641717   0.5115181
#rank4        0.2119375 0.090715546   0.4706961

# No issue here.

#####################################################
# CHECK GAM, without smoothing: this is "similar" to a classical glm

fit_gamAsGLM <- gam(admit ~ gre + gpa + rank,
               data = data_glm,
               family = "binomial"
) 

or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gre", values = c(0,380))
#  predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gre      0    380    2.3643      1.146592         4.87524

or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gpa", values = c(0,4))
#  predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gpa      0      4  24.93195      4.042141        153.7804

# We should get results similar as `or_glm`. The CIs from or_gam are (sometime largely) underestimated.

Here is a proposal to solve the issue. It is based on help('predict.gam') example to estimate the variance of sum of predictions using lpmatrix. Here, we want the variance of a difference but it is straightforward.

# Evaluate "other parameters"
meanGre <- mean(data_glm$gre)
meanGpa <- mean(data_glm$gpa)
# Any rank is valid, because it is not smoothed (additive effect)
AnyRank <- data_glm$rank[1]

# Build a test data with values
testdatagre <- data.frame(gre = c(0, 380),
                          gpa = meanGpa, rank = AnyRank)

testdatagpa <- data.frame(gre = meanGre,
                          gpa = c(0, 4), rank = AnyRank)

# This function provides Or and ci for odds ratio.
# Note: it could easily be vectorised for your 'slice' option
# (just increase the size of `testdata`. The results will be relatively to the first line)

or_gam2 <- function(model, testdata){
  n <- nrow(testdata)
  Xp <- predict(model, testdata, type="lpmatrix")
  a <- cbind(-1, diag(n-1))
  Xs <- a %*% Xp
  # log Odds-Ratios
  LnOR <- Xs %*% coef(model)
  OR <- exp(LnOR) # gives OR
  # variance of the prediction
  var.LnOR <- Xs %*% model$Vp %*% t(Xs)
  sd.LnOR <- sqrt(diag(var.LnOR))
  return(c(OR=OR, 
           lower= exp(LnOR-1.96*sd.LnOR), 
           upper= exp(LnOR+1.96*sd.LnOR)))
}

or_gam2(fit_gamAsGLM, testdatagre)

#      OR    lower    upper 
# 2.364300 1.046731 5.340350 

or_gam2(fit_gamAsGLM, testdatagpa)
#        OR      lower      upper 
#24.931952   1.849078 336.168826 

We get much closer CIs. Note that they are not strictly the same because the se estimations in the gam function are completely different than in the glm function. Here is an example with real gam

fit_gam <- gam(admit ~ s(gre) + s(gpa) + rank,
               data = data_glm,
               family = "binomial"
) # fit model

or_gam(data = data_glm, model = fit_gam, pred = "gpa", values=c(3.4,3.5))

# predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gpa    3.4    3.5  1.219184      1.211133        1.227289

# The CIs are extremely tights.  

testdatagpa <- data.frame(gre = meanGre,
                          gpa = c(3.4,3.5), rank = AnyRank)

or_gam2(fit_gam, testdatagpa)
#      OR    lower    upper 
#1.219184 1.026426 1.448141 

This function could simply be adapted to your framework. Hope it helps. Let me know what you think.

pat-s commented 1 year ago

Hi @rpouillot

Thanks for the detailed description and sorry for my late answer (didn't watch the repo unfortunately). I think I've never aimed to fit a GLM without smoothing - to me, smoothing is the essential part of the GAM.

That being sad, I also cannot fully judge if the CI values of the GAM are possibly not sound when not using smoothers. Does this only apply to this case of also when using smoothers?

I'm open to approaches that outline the issue and propose an alternative solution (like yours). Is this still of interest to you?