pbs-assess / sdmTMB

:earth_americas: An R package for spatial and spatiotemporal GLMMs with TMB
https://pbs-assess.github.io/sdmTMB/
186 stars 26 forks source link

`predict()` crashes R when model has a factor random intercept that is not the right class in `newdata` #278

Closed sebpardo closed 10 months ago

sebpardo commented 10 months ago

R crashes when runningpredict.sdmTMB() if 1) the model has a random intercept that is a factor, and 2) newdata is being provided but with the random intercept variable having a different class. The checks for the correct class happen as expected for variables that are fixed effects, but somehow these checks don't happen for the random intercept variable. See MWE below:

library(sdmTMB)
library(dplyr)

pcod$year_f <- as.factor(pcod$year)

pcod_yr_as_fct <- mutate(pcod, year = as.factor(year))
pcod_yrf_as_num <- mutate(pcod, year_f = as.numeric(year))
pcod_yrf_as_chr <- mutate(pcod, year_f = as.character(year))

mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)

m <- sdmTMB(
  data = pcod,
  formula = density ~ poly(log(depth), 2),
  mesh = mesh,
  family = tweedie(link = "log"),
  spatial = "on",
  time = "year"
)

m_yrf_fe <- sdmTMB(
  data = pcod,
  formula = density ~ poly(log(depth), 2) + year_f,
  mesh = mesh,
  family = tweedie(link = "log"),
  spatial = "on",
  time = "year"
)

m_yrf_re <- sdmTMB(
  data = pcod,
  formula = density ~ poly(log(depth), 2) + (1 | year_f),
  mesh = mesh,
  family = tweedie(link = "log"),
  spatial = "on",
  time = "year"
)

p1 <- predict(m, newdata = pcod) # This works
p2 <- predict(m, newdata = pcod_yr_as_fct) # This throws an error as expected
# Error in `check_time_class()`:
# ! Class of fitted time column (integer) does not match class of `newdata` time
# column (factor).

p3 <- predict(m_yrf_fe, newdata = pcod) # This works
p4 <- predict(m_yrf_fe, newdata = pcod_yr_as_fct) # This throws an error as expected
# Error in `check_time_class()`:
# ! Class of fitted time column (integer) does not match class of `newdata` time
# column (factor).

p5 <- predict(m_yrf_fe, newdata = pcod_yrf_as_num) # This throws a different error
# Error in `contrasts<-`(`*tmp*`, value = ca) : 
#   contrasts apply only to factors
# In addition: Warning message:
#   In model.frame.default(Terms, newdata, xlev = object$xlevels[[i]]) :
#   variable 'year_f' is not a factor

p6 <- predict(m_yrf_re, newdata = pcod) # This works
p7 <- predict(m_yrf_re, newdata = pcod_yr_as_fct) # This throws an error as expected
# Error in `check_time_class()`:
# ! Class of fitted time column (integer) does not match class of `newdata` time
# column (factor).
# Run `rlang::last_trace()` to see where the error occurred.

p8 <- predict(m_yrf_re, newdata = pcod_yrf_as_num) # This crashes R
p8 <- predict(m_yrf_re, newdata = pcod_yrf_as_chr) # This also crashes R

The crash also happens when re_form = NA, but it runs when re_form_iid = NA is also specified, which makes sense as it's providing a population-level prediction and likely not using the random effect variable:

p9 <- predict(m_yrf_re, newdata = pcod_yrf_as_num, re_form = NA) # This also crashes R
p10 <- predict(m_yrf_re, newdata = pcod_yrf_as_num, re_form = NA, re_form_iid = NA) # This runs!

I ran these on a Windows machine and @ecophilina confirmed these crashes also happen on a Mac. I'll try to dig deeper into this and send a PR next week.

seananderson commented 10 months ago

The simplest solution is we should check for class factor here: https://github.com/pbs-assess/sdmTMB/blob/48b391434e9b214c97a3c24cf4f6f7548cac0272/R/predict.R#L459-L460 for each element RE_names in nd.