pbs-assess / sdmTMB

:earth_americas: An R package for spatial and spatiotemporal GLMMs with TMB
https://pbs-assess.github.io/sdmTMB/
182 stars 26 forks source link

get_index() over alternate time variable? #338

Closed seanhardison1 closed 2 months ago

seanhardison1 commented 2 months ago

I've fitted an IID spatiotemporal model using time = "year" in addition to a factor-smooth interaction for year month: s(month, year_fac, bs = 'fs'). This is convenient for me because I can predict over months within years without needing to fit the ST random field by month.

Ultimately, I would like to generate a biomass index by month that includes SEs, and I've found that I can do this by simulating predictions from the joint posterior distribution (predict(..., nsim = ...)) and then summarizing them using get_index_sims(...). To do this, I updated get_index_sims so that it alters the time attribute passed from the fitted model from year to a yearmonth numeric.

area_function = function(x, area) x + log(area)
agg_function = function(x) sum(exp(x))
est_function = stats::median

make_index <- 
  function(pred_sims = pred_out_seas_sims, 
           pred_grid = pred_df_glmm_filt,
           area = 6.25){

    row.names(pred_sims) <- pred_grid %>% 
      pull(ymon)

    attr(pred_sims, "time") <- "ymon"
    .time_attr <- attr(pred_sims, "time")
    pred_sims <- apply(pred_sims, 2L, function(x) area_function(x, area))
    .t <- as.numeric(rownames(pred_sims))
    ymons <- sort(unique(.t))
    ymon_indexes <- lapply(ymons, function(x) which(.t %in% x))
    out1 <- lapply(ymon_indexes, function(x) {
      apply(pred_sims[x, , drop = FALSE], 2L, agg_function)
    })
    out <- lapply(out1, function(x) {
      data.frame(est = est_function(x),
                 lwr = stats::quantile(x,
                                       probs = (1 - 0.95)/2),
                 upr = stats::quantile(x,
                                       probs = 1 - (1 - 0.95)/2), 
                 se = stats::sd(log(x)),
                 log_est = mean(log(x)))
    })
    out <- do.call("rbind", out)
    out[[.time_attr]] <- ymons
    out <- out[, c(.time_attr, "est", "lwr", "upr", "log_est", 
                   "se"), drop = FALSE]
    return(out)
  }

This seems to work for me. However, it appears to be much more challenging to do something similar with get_index because of outputs from predict when return_tmb = T (not a simple attribute swap).

So my question is, is it feasible to generate an index using get_index that does not use the time attribute of the ST random field? This would be desirable when it is computationally infeasible to estimate the ST random field at sub-annual temporal resolutions, and when the analyst (me!) would like SEs returned with the index without knowledge of TMB.

seananderson commented 2 months ago

Can you calculate predictions and do the Monte Carlo integration (summing the grid cell values weighted by area) for each month at a time? get_index() is just summing over whatever you give it grouped by the time variable.

index <- lapply(month_vector, \(m) {
  # make your newdata grid here for a given month
  nd <- subset(grid, month == m) # or similar
  p <- predict(fit, newdata = nd, return_tmb_object = TRUE)
  get_index(p, bias_correct = TRUE)
})
index <- do.call(rbind, index)

If memory happens to be an issue, you could also change month_vector to a year-month vector.

seananderson commented 2 months ago

A full example:

library(sdmTMB)
d <- dogfish
d$month <- rep_len(1:12, nrow(d))

d$year_fac <- factor(d$year)

fit <- sdmTMB(
  catch_weight ~ s(month, year_fac, bs = 'fs'), 
  data = d, 
  time = "year", 
  spatial = "off", 
  spatiotemporal = "off", 
  family = tweedie()
)

grid <- replicate_df(sdmTMB::wcvi_grid, time_name = "year", time_values = unique(dogfish$year))
grid$year_fac <-  factor(grid$year)

index <- lapply(1:12, \(m) {
  nd <- dplyr::mutate(grid, month = m)
  p <- predict(fit, newdata = nd, return_tmb_object = TRUE)
  ii <- get_index(p, bias_correct = TRUE)
  ii$month <- m
  ii
})
index <- do.call(rbind, index)

head(index)
#>   year      est       lwr       upr  log_est        se month
#> 1 2004 735024.0 284545.22 1898679.6 13.50766 0.4841981     1
#> 2 2006 146673.0  60951.51  352952.0 11.89596 0.4480321     1
#> 3 2008 668907.7 310499.33 1441025.7 13.41340 0.3915706     1
#> 4 2010 610885.8 272372.10 1370116.4 13.32267 0.4121203     1
#> 5 2012 182703.0  67879.94  491756.5 12.11562 0.5051733     1
#> 6 2014 109097.7  43682.18  272475.2 11.60000 0.4670002     1
tail(index)
#>     year       est      lwr      upr  log_est        se month
#> 115 2012  58635.02 22658.67 151732.9 10.97909 0.4851056    12
#> 116 2014  87922.85 29141.26 265274.3 11.38421 0.5634311    12
#> 117 2016  83998.63 27905.75 252842.9 11.33856 0.5622388    12
#> 118 2018 110949.10 44259.81 278123.7 11.61683 0.4688833    12
#> 119 2021  90260.13 35407.12 230091.9 11.41045 0.4774490    12
#> 120 2022 231489.82 97390.52 550233.6 12.35229 0.4417464    12

Created on 2024-04-24 with reprex v2.1.0

seanhardison1 commented 2 months ago

Ah you beat me to it! Yep, that works for me. Thank you for your help as always. Case closed I think