pbs-assess / sdmTMB

:earth_americas: An R package for spatial and spatiotemporal GLMMs with TMB
https://pbs-assess.github.io/sdmTMB/
182 stars 26 forks source link

Adding predict_args to fit() cause an error when passed through predict.sdmTMB #276

Closed MikkoVihtakari closed 8 months ago

MikkoVihtakari commented 8 months ago

There seems to be a potential bug in the fit() function's predict.sdmTMB shortcut. Perhaps the naming has changed from here?

library(sdmTMB)
packageVersion("sdmTMB")
#> [1] '0.4.1'

mesh <- make_mesh(pcod_2011, c("X", "Y"), cutoff = 20)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)

# Works:

fit <- sdmTMB(
  density ~ s(depth),
  data = pcod_2011, mesh = mesh,
  family = tweedie(link = "log"),
  time = "year"
)
#> Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
#> TMB was built with Matrix version 1.6.0
#> Current Matrix version is 1.6.1.1
#> Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package

get_index(
  predict(
    fit, 
    newdata = replicate_df(qcs_grid, "year", unique(pcod_2011$year)), 
    type = "response", 
    return_tmb_object = TRUE),
  area = 1)
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#>   year      est      lwr      upr  log_est        se
#> 1 2011 232101.1 184906.1 291341.9 12.35493 0.1159841
#> 2 2013 237055.6 187933.5 299017.2 12.37605 0.1184748
#> 3 2015 263310.6 207751.4 333727.9 12.48109 0.1209164
#> 4 2017 149987.3 113142.7 198830.2 11.91831 0.1438294

# Does not work:

fit <- sdmTMB(
  density ~ s(depth),
  data = pcod_2011, mesh = mesh,
  family = tweedie(link = "log"),
  time = "year",
  predict_args = 
                      list(newdata = replicate_df(qcs_grid, "year", unique(pcod_2011$year)),
                      type = "response", 
                      return_tmb_object = TRUE),
  index_args = list(area = 1)
)

get_index(predict(fit, return_tmb_object = TRUE))
#> Error in `get_generic()`:
#> ! It looks like the model was built with an older version of sdmTMB.
#> Please refit with the current version.
#> Backtrace:
#>     ▆
#>  1. └─sdmTMB::get_index(predict(fit, return_tmb_object = TRUE))
#>  2.   └─sdmTMB:::get_generic(...)
#>  3.     └─cli::cli_abort(...)
#>  4.       └─rlang::abort(...)

Created on 2023-11-07 with reprex v2.0.2

seananderson commented 8 months ago

This is working as intended, but I'll admit the documentation and errors around this are poor. I'll see what I can do to fix that.

The do_index, predict_args, and index_args were added mostly for internal use to slightly speed up a workflow and match what the VAST package does exactly. The gist is that if these are used, the resulting object can go right to get_index(), which in this case pulls out the already calculated index (unless bias_correct = TRUE). If bias_correct = FALSE, you save yourself 2 TMB::MakeADFun() calls. If bias_correct = TRUE, you save yourself 1 TMB::MakeADFun() calls. This saves a bit of time for large datasets, but at the expense of doing all your index calculations and SD reporting during the model fitting. So, if you might try many models, choose one, and then calculate your index, the usual workflow makes sense. Also, if you're going to feed the output through predict anyways, then the usual workflow makes sense.

Actually, the only real issue with what you showed was your initial sdmTMB call was missing do_index = TRUE meaning that the predict_args and index_args were ignored and the final predict call didn't have newdata. I'll see what I can do to clear that up. Maybe I'll return an error if index_args or predict_args are set without do_index being set.

library(sdmTMB)

mesh <- make_mesh(pcod_2011, c("X", "Y"), cutoff = 20)
g <- replicate_df(qcs_grid, "year", unique(pcod_2011$year))

# standard workflow:
fit <- sdmTMB(
  density ~ s(depth),
  data = pcod_2011, mesh = mesh,
  family = tweedie(link = "log"),
  time = "year"
)
p <- predict(
  fit,
  newdata = replicate_df(qcs_grid, "year", unique(pcod_2011$year)),
  type = "response",
  return_tmb_object = TRUE
)
get_index(p, area = 1)
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#>   year      est      lwr      upr  log_est        se
#> 1 2011 232101.1 184906.1 291341.9 12.35493 0.1159841
#> 2 2013 237055.6 187933.5 299017.2 12.37605 0.1184748
#> 3 2015 263310.6 207751.4 333727.9 12.48109 0.1209164
#> 4 2017 149987.3 113142.7 198830.2 11.91831 0.1438294

fit <- sdmTMB(
  density ~ s(depth),
  data = pcod_2011, mesh = mesh,
  family = tweedie(link = "log"),
  time = "year",
  do_index = TRUE, # <-
  predict_args = list(newdata = g), # <-
  index_args = list(area = 1) # <-
)
# in this case, you can go right to get_index():
get_index(fit)
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#>   year      est      lwr      upr  log_est        se
#> 1 2011 232101.1 184906.1 291341.9 12.35493 0.1159841
#> 2 2013 237055.6 187933.5 299017.2 12.37605 0.1184748
#> 3 2015 263310.6 207751.4 333727.9 12.48109 0.1209164
#> 4 2017 149987.3 113142.7 198830.2 11.91831 0.1438294
get_index(fit, bias_correct = TRUE)
#>   year      est      lwr      upr  log_est        se
#> 1 2011 274561.9 218733.1 344640.4 12.52293 0.1159841
#> 2 2013 287459.8 227893.1 362596.0 12.56884 0.1184748
#> 3 2015 318363.7 251188.2 403504.0 12.67095 0.1209164
#> 4 2017 189031.1 142595.3 250588.4 12.14967 0.1438294

# you could do this, but you'll be re-doing the index standard error calculations:
p <- predict(fit, newdata = g, return_tmb_object = TRUE)
get_index(p)
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#>   year      est      lwr      upr  log_est        se
#> 1 2011 232101.1 184906.1 291341.9 12.35493 0.1159841
#> 2 2013 237055.6 187933.5 299017.2 12.37605 0.1184748
#> 3 2015 263310.6 207751.4 333727.9 12.48109 0.1209164
#> 4 2017 149987.3 113142.7 198830.2 11.91831 0.1438294

Created on 2023-11-07 with reprex v2.0.2

seananderson commented 8 months ago

OK, I made several improvements here to error handling and documentation.

Examples:

library(sdmTMB)

mesh <- make_mesh(pcod_2011, c("X", "Y"), cutoff = 20)
g <- replicate_df(qcs_grid, "year", unique(pcod_2011$year))

fit <- sdmTMB(
  density ~ s(depth),
  data = pcod_2011, mesh = mesh,
  family = tweedie(link = "log"),
  time = "year",
  predict_args = list(newdata = g),
  index_args = list(area = 1)
)
#> Error in `sdmTMB()`:
#> ! `predict_args` and/or `index_args` were set but `do_index` was FALSE.
#> `do_index` must be TRUE for these arguments to take effect.

fit <- sdmTMB(
  density ~ s(depth),
  data = pcod_2011, mesh = mesh,
  family = tweedie(link = "log"),
  time = "year"
)
p1 <- predict(fit, return_tmb_object = TRUE)
get_index(p1)
#> Error in `get_generic()` at sdmTMB/R/index.R:79:2:
#> ! It looks like the predict function was run without `newdata`
#>   specified.
#> Re-run the predict function with `newdata` specified.

p2 <- predict(fit, newdata = g, return_tmb_object = TRUE)
get_index(p2)
#> Bias correction is turned off.
#> It is recommended to turn this on for final inference.
#>   year      est      lwr      upr  log_est        se
#> 1 2011 232101.1 184906.1 291341.9 12.35493 0.1159841
#> 2 2013 237055.6 187933.5 299017.2 12.37605 0.1184748
#> 3 2015 263310.6 207751.4 333727.9 12.48109 0.1209164
#> 4 2017 149987.3 113142.7 198830.2 11.91831 0.1438294

Created on 2023-11-07 with reprex v2.0.2

MikkoVihtakari commented 8 months ago

Thanks for clarification and sorry for opening an issue on this. Seems like a user issue indeed. I did not remember/realise to switch do_index on. This is a useful feature, for example, in markdown documents. My models (the Barents Sea and continental slope with a 2 km depth resolution) take minutes to compile with bias_correct = FALSE and saving two TMB::MakeADFun() calls probably saves half a minute or so.