pbs-assess / sdmTMB

:earth_americas: An R package for spatial and spatiotemporal GLMMs with TMB
https://pbs-assess.github.io/sdmTMB/
186 stars 26 forks source link

Add an elegant forecasting interface (or add example with the weights hack) #21

Closed seananderson closed 3 years ago

seananderson commented 4 years ago

Setting weights to zero in the source data for years you want to predict on was a hack suggested by Eric. At the very least, add it in a vignette or example or a note in the help.

Should be doable to let the prediction data frame not match the fitted one, but requires some careful checking of the input data into TMB to make sure it makes sense. Currently it (R) checks to make sure they match on purpose.

seananderson commented 4 years ago

I added an example in ?predict.sdmTMB. I think I see how we could automate this type of thing for new data without much trouble, but there remains the problem that you have to be careful about what you actually want to do with the prediction. E.g. if you have + as.factor(year), then what do you want to do in a new year? So forcing the user to do it via the weights argument and think it all through may not be that bad. It also makes sure that everything is set up correctly (e.g. missing years for random walk covariates).

# Forecasting using Eric Ward's hack with the `weights` argument:
# Add on a fake year of data with the year to forecast:
dfake <- rbind(d, d[nrow(d), ])
dfake[nrow(dfake), "year"] <- max(d$year) + 1
tail(dfake) # last row is fake!

weights <- rep(1, nrow(dfake))
weights[length(weights)] <- 0 # set last row weight to 0

dfake$year_factor <- dfake$year
dfake$year_factor[nrow(dfake)] <- max(d$year) # share fixed effect for last 2 years

pcod_spde <- make_spde(dfake$X, dfake$Y, n_knots = 50)

m <- sdmTMB(
  data = dfake, formula = density ~ 0 + as.factor(year_factor),
  ar1_fields = TRUE, # using an AR1 to have something to forecast with
  weights = weights,
  include_spatial = TRUE, # could also be `FALSE`
  time = "year", spde = pcod_spde, family = tweedie(link = "log"),
  silent = FALSE
)

# Add a year to our grid:
qcs_grid$year_factor <- qcs_grid$year
grid2018 <- qcs_grid[qcs_grid$year == 2017L, ]
grid2018$year <- 2018L # `L` because `year` is an integer in the data
qcsgrid_forecast <- rbind(qcs_grid, grid2018)

predictions <- predict(m, newdata = qcsgrid_forecast)

plot_map(predictions, "exp(est)") +
  scale_fill_viridis_c(trans = "log10")
plot_map(predictions, "epsilon_st") +
  scale_fill_gradient2()
ericward-noaa commented 4 years ago

Agreed - there's lots of models that aren't appropriate for this kind of thing - anything with years as factors. Covariates are ok if they're static across time slices or random walks.

I also was digging into the sdmTMB_cv function today to see if it's worth modifying that - right now, I think not. But it would be nice to automate the calculation of log_lik a little, because I worry that's a place where it's easy for users to make silly errors

seananderson commented 4 years ago

Looking at you cross validation testing, I wonder if it would make sense to switch the package cross validation function to use the weights hack for all cross validation. That would have two benefits: it would simplify keeping the knots constant across folds and it would make it possible to drop entire years from a fold. Then you'd be able to do time-slice cross validation with just the fold IDs vector. It would remove the ability to include the mesh formation as part of the testing, but that wouldn't be hard to add back if we really care about it and usually I think that will just add noise to the procedure.

ericward-noaa commented 4 years ago

Yeah I think that’s a great idea. The current CV function doesn’t make predictions for years held out, but using the weights would. Happy to work on this next week.

On Fri, Sep 18, 2020 at 3:09 PM Sean Anderson notifications@github.com wrote:

Looking at you cross validation testing, I wonder if it would make sense to switch the package cross validation function to use the weights hack for all cross validation. That would have two benefits: it would simplify keeping the knots constant across folds and it would make it possible to drop entire years from a fold. Then you'd be able to do time-slice cross validation with just the fold IDs vector. It would remove the ability to include the mesh formation as part of the testing, but that wouldn't be hard to add back if we really care about it and usually I think that will just add noise to the procedure.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pbs-assess/sdmTMB/issues/21#issuecomment-695108223, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGQEZBIB47UR3R5XAU6T5TSGPLBXANCNFSM4RPROVHA .

-- Sent from my iPhone

ericward-noaa commented 4 years ago

I think I've implemented the weights hack correctly. By default, we have the argument knot_type = "fixed", where knots are constant across folds. Is there a situation where we want knot_type = "unique", or should that be commented out for the time being?

On Fri, Sep 18, 2020 at 3:18 PM Eric Ward - NOAA Federal eric.ward@noaa.gov wrote:

Yeah I think that’s a great idea. The current CV function doesn’t make predictions for years held out, but using the weights would. Happy to work on this next week.

On Fri, Sep 18, 2020 at 3:09 PM Sean Anderson notifications@github.com wrote:

Looking at you cross validation testing, I wonder if it would make sense to switch the package cross validation function to use the weights hack for all cross validation. That would have two benefits: it would simplify keeping the knots constant across folds and it would make it possible to drop entire years from a fold. Then you'd be able to do time-slice cross validation with just the fold IDs vector. It would remove the ability to include the mesh formation as part of the testing, but that wouldn't be hard to add back if we really care about it and usually I think that will just add noise to the procedure.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pbs-assess/sdmTMB/issues/21#issuecomment-695108223, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGQEZBIB47UR3R5XAU6T5TSGPLBXANCNFSM4RPROVHA .

-- Sent from my iPhone

--

Eric Ward

Northwest Fisheries Science Center

NOAA Fisheries | U.S. Department of Commerce

Office: 206-302-1745

Mobile: 206-650-7401 https://faculty.washington.edu/warde/

seananderson commented 4 years ago

I overhauled the cross validation function yesterday and in doing so removed the unique mesh option. It now uses the MLE values from the first fold as starting values for the other folds and works in parallel if you set up a future package plan first. This version now requires you to design your own mesh just like the main fitting function.

seananderson commented 4 years ago

I added an argument extra_time to sdmTMB() in the devel branch that sets up the model to accept extra time slices in the prediction. https://github.com/pbs-assess/sdmTMB/commit/bca73e45c22891bfa8e4599ac80cf428ae50dc89