jacob-long / jtools

Tools for summarizing/visualizing regressions and other helpful stuff
https://jtools.jacob-long.com
GNU General Public License v3.0
164 stars 22 forks source link

predict_merMod discrepency in predictions with type = "response" #144

Closed AlpDYel closed 9 months ago

AlpDYel commented 11 months ago

predict_merMod gives different answers from predict.merMod and itself when on response scale

I am using predict_merMod to get the prediction standard errors as described in http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#predictions-andor-confidence-or-prediction-intervals-on-predictions ; however, in one model I got negative predicted probabilities and decided to dig further.

Based on my findings there is a major discrepancy when using predict_merMod(...,type="response") simply applying the inverse link function. In a reproducible example, I compared predict_merMod with type="link" and type="response" with predict.merMod with same spesifications. The results indicate that predict_merMod is off by some amount when it comes to predicting on the response scale. Am I mis-specifying something or is there an issue with response scale prediction?

suppressPackageStartupMessages(library(tidyverse)) # A way of life
suppressPackageStartupMessages(library(jtools)) #predict_mer mod
suppressPackageStartupMessages(library(lme4)) # glmer
suppressPackageStartupMessages(library(wooldridge)) # Data

#Get data
wooldridge::big9salary -> data
data <- data %>% dplyr::select(salary,year,id,top20phd,exper,pubindx) %>%
  mutate(year = year-92)
#Fit some model
mod_glmer <- glmer(salary~year+ns(pubindx,3)+ns(exper,df=3) + (1|id),
                   family = Gamma(link = "log"),
                   data=data,nAGQ = 0)
#Create data for new prediction
new_data <- data %>%
  filter(id == 120) %>%
  mutate(exper=10,year=95) %>% 
  select(-pubindx) %>%
  distinct() 
pubindx_new <- seq(0,10,by=1) %>% as.data.frame()
pubindx_new <-  rename(pubindx_new,pubindx= `.`)
pred_data <-cross_join(new_data,pubindx_new)
 #Predict with predict_merMod
mer_mod_resp <- predict_merMod(mod_glmer,newdata = pred_data,se.fit = T,use.re.var = T,type = "response")
mer_mod_link <- predict_merMod(mod_glmer,newdata = pred_data,se.fit = T,use.re.var = T)
#Compare type = "response" against link scale with inverse link (exp = inv of log link)
mer_mod_resp$fit-exp(mer_mod_link$fit) # Major discrepency
        1         2         3         4         5         6         7         8         9        10        11        12        13        14        15        16        17 
-340088.9 -341477.0 -342873.5 -344281.1 -345702.8 -347141.4 -348599.9 -350081.2 -351588.3 -353124.5 -354692.7 -340088.9 -341477.0 -342873.5 -344281.1 -345702.8 -347141.4 
       18        19        20        21        22        23        24        25        26        27        28        29        30        31        32        33 
-348599.9 -350081.2 -351588.3 -353124.5 -354692.7 -340088.9 -341477.0 -342873.5 -344281.1 -345702.8 -347141.4 -348599.9 -350081.2 -351588.3 -353124.5 -354692.7 
#Predict with regular predict
predict_resp <- predict(mod_glmer,newdata = pred_data,type="response")
predict_link <- predict(mod_glmer,newdata = pred_data)
#Compare type = "response" against link scale with inverse link (exp = inv of log link)
predict_resp- exp(predict_link) #Gives 0 for all
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
 #Compare Predict on link scale
predict_link - mer_mod_link$fit # Gives 0 for all
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
#Compare predict on response scale

 predict_resp - mer_mod_resp$fit # Major discrepency
       1        2        3        4        5        6        7        8        9       10       11       12       13       14       15       16       17       18       19 
340088.9 341477.0 342873.5 344281.1 345702.8 347141.4 348599.9 350081.2 351588.3 353124.5 354692.7 340088.9 341477.0 342873.5 344281.1 345702.8 347141.4 348599.9 350081.2 
      20       21       22       23       24       25       26       27       28       29       30       31       32       33 
351588.3 353124.5 354692.7 340088.9 341477.0 342873.5 344281.1 345702.8 347141.4 348599.9 350081.2 351588.3 353124.5 354692.7 

As can be seen from the example, predict_merMod gives different predictions when on the response scale compared to not only predict.merMod but also predict_merMod in link scale with the inverse link applied.

AlpDYel commented 11 months ago

I might have figured out the issue (at least partially): If the random effects are discarded, I get the desired output (see example continued below)

#Predict with no SE
predict_merMod(mod_glmer,newdata=pred_data,re.form=~0,type="response",se.fit = T)->merMod_noRE
predict(mod_glmer,newdata=pred_data,re.form=~0,type="response")->predict_noRE
merMod_noRE$fit-predict_noRE # All 0!
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 

#Predict with no SE
predict_merMod(mod_glmer,newdata=pred_data,re.form=~0,se.fit = T)->merMod_noRE_link
predict(mod_glmer,newdata=pred_data,re.form=~0)->predict_noRE_link

merMod_noRE_link$fit-predict_noRE_link #All 0!
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 

exp(merMod_noRE_link$fit) - merMod_noRE$fit #All 0!
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
jacob-long commented 9 months ago

Thank you for the reproducible report! I think I've gotten to the bottom of this; it appears to be a very legitimate bug. When SEs are requested, it is adding the random effects on the link scale to the already-transformed fixed effect predictions! When SEs aren't requested, this doesn't happen. And of course, when the user suppresses the random effects the bug won't manifest either.

Update inbound...