Closed matt-w-rees closed 8 months ago
Thanks for the reprex @matt-w-rees. These are challenging to implement, and unfortunately the mgcv
documentation doesn't make it easy to learn how to do them. You are correct that the by
variable needs to be a matrix, but it must be a numeric matrix. If you look at the help for linear functionals (?mgcv::linear.functional.terms
and ?mgcv::gam.models
), you will see that the by
variable is used as a weighting matrix, and the resulting smooth is constructed as rowSums(L*F)
, where F
is your distributed lag smooth and L
is the weighting matrix. I spent a lot of time on this for a few recent projects, and finally realised that I needed to supply group-level distributed lag terms for each group in the formula (this is reasonably well-described here: https://github.com/nicholasjclark/portal_VAR. Below is some code that should work for you. I'd also recommend you reinstall mvgam
as the most recent version does a better job of passing along error messages from mgcv
:
library(mvgam)
lagard <- function(x, n.lag = 6) {
n <- length(x); X <- matrix(NA, n, n.lag)
for (i in 1:n.lag) X[i:n, i] <- x[i:n - i + 1]
X
}
# simulate time series data at 3 sites
dat <- sim_mvgam(T = 100, n_series = 3, n_lv = 2,
family = 'poisson',
mu = runif(6, -1, 2),
trend_rel = 0.6)
# our data is
data_train <- dat$data_train
# add fake precip covariate
data_train$precip <- rnorm(nrow(data_train))
# make training data a list with lag
data_train_list <- list(lag=matrix(0:5, nrow(data_train), 6, byrow = TRUE),
y = data_train$y,
time = data_train$time,
series = data_train$series)
# calculate lagged values
data_train_list$precip <- lagard(data_train$precip)
# remove first 5 rows due to NA's (from lag)
data_train_list_trim <- lapply(data_train_list, tail, -6)
# fit SS model - same lag response across series
mod1 <- mvgam(y ~ -1,
trend_formula = ~ te(precip, lag, k = c(6, 6)),
data = data_train_list_trim,
family = poisson(),
trend_model = 'AR1')
# now try to fit SS model - different lag responses across series
# with the most recent version of {mvgam}, you will get a more informative error
# here telling you that factor 'by' variables can't be used with matrix covariates
mod2 <- mvgam(y ~ -1,
trend_formula = ~ te(precip, lag, by = trend, k = c(6, 6)),
data = data_train_list_trim,
family = poisson(),
trend_model = 'AR1')
# the best way I've found to do this, which is unforunately a bit cumbersone:
# create weight matrices for each level of the grouping facotr for setting up hierarchical
# distributed lag terms
weights_s1 <- weights_s2 <-
weights_s3 <- matrix(1, ncol = ncol(data_train_list_trim$lag),
nrow = nrow(data_train_list_trim$lag))
# for each series, the rows in its weighting matrix need to have '1s' when the corresponding
# observation in data matches that series, and '0s' otherwise
weights_s1[!(data_train_list_trim$series == 'series_1'), ] <- 0
data_train_list_trim$weights_s1 <- weights_s1
weights_s2[!(data_train_list_trim$series == 'series_2'), ] <- 0
data_train_list_trim$weights_s2 <- weights_s2
weights_s3[!(data_train_list_trim$series == 'series_3'), ] <- 0
data_train_list_trim$weights_s3 <- weights_s3
# independent series-level distributed lag terms
mod3 <- mvgam(y ~ -1,
trend_formula = ~
te(precip, lag, by = weights_s1, k = c(5, 5)) +
te(precip, lag, by = weights_s2, k = c(5, 5)) +
te(precip, lag, by = weights_s3, k = c(5, 5)),
data = data_train_list_trim,
family = poisson(),
trend_model = 'AR1')
summary(mod3)
plot(mod3, type = 'smooths', trend_effects = TRUE)
# unfortunately plot_predictions() and friends will fail because {marginaleffects}
# only currently works with data.frames, not lists (I'm working on this)
conditional_effects(mod3)
# alternatively you can set these up as a hierarchical GAM, where each series
# shares a 'global' distributed lag term and then is allowed to deviate from
# that term
mod3 <- mvgam(y ~ -1,
trend_formula = ~
# the shared distributed lag term
te(precip, lag,k = c(6, 6)) +
# series-level deviation terms
te(precip, lag, by = weights_s1, k = c(5, 5)) +
te(precip, lag, by = weights_s2, k = c(5, 5)) +
te(precip, lag, by = weights_s3, k = c(5, 5)),
data = data_train_list_trim,
family = poisson(),
trend_model = 'AR1')
Thanks so much for the quick response and detailed explanation Nick -- that all makes sense and the solution worked -- I love my new lag terms!
Great to hear @matt-w-rees. And please let me know if there is anything else I can help with, I'm keen to develop the package further but it would be good to have some more targeted directions so it can be useful to people
Okay glad to hear Nick, because I am running into further errors with this model structure. For example, I want to also include an effect of season, but allow this to vary over time. I've updated the reprex to include a season covariate and factor smooth term for season / time below, but the model errors with 'Missing input data for the following data variables: S_trend4.'. Note this model works with just the factor smooth, so I am assuming it is related to the lag effects.
library(mvgam)
#> Loading required package: mgcv
#> Loading required package: nlme
#> This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.
#> Loading required package: Rcpp
#> Loading required package: brms
#> Loading 'brms' package (version 2.20.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following objects are masked from 'package:mgcv':
#>
#> s, t2
#> The following object is masked from 'package:stats':
#>
#> ar
#> Loading required package: marginaleffects
#> Loading required package: insight
#> Welcome to mvgam. Please cite as: Clark, NJ, and Wells, K. 2022. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 2022, https://doi.org/10.1111/2041-210X.13974
lagard <- function(x, n.lag = 6) {
n <- length(x); X <- matrix(NA, n, n.lag)
for (i in 1:n.lag) X[i:n, i] <- x[i:n - i + 1]
X
}
# simulate time series data at 3 sites
dat <- sim_mvgam(T = 100, n_series = 3, n_lv = 2,
family = 'poisson',
mu = runif(6, -1, 2),
trend_rel = 0.6)
# our data is
data_train <- dat$data_train
# add fake precip covariate
data_train$precip <- rnorm(nrow(data_train))
# add fake season covariate too
data_train$season <- c(rep(c("spring", "summer", "autumn", "winter"), nrow(data_train) / 4), "spring")
# make training data a list with lag
data_train_list <- list(lag=matrix(0:5, nrow(data_train), 6, byrow = TRUE),
y = data_train$y,
season = as.factor(data_train$season),
time = data_train$time,
series = data_train$series)
# calculate lagged values
data_train_list$precip <- lagard(data_train$precip)
# remove first 5 rows due to NA's (from lag)
data_train_list_trim <- lapply(data_train_list, tail, -6)
# the best way I've found to do this, which is unforunately a bit cumbersone:
# create weight matrices for each level of the grouping facotr for setting up hierarchical
# distributed lag terms
weights_s1 <- weights_s2 <-
weights_s3 <- matrix(1, ncol = ncol(data_train_list_trim$lag),
nrow = nrow(data_train_list_trim$lag))
# for each series, the rows in its weighting matrix need to have '1s' when the corresponding
# observation in data matches that series, and '0s' otherwise
weights_s1[!(data_train_list_trim$series == 'series_1'), ] <- 0
data_train_list_trim$weights_s1 <- weights_s1
weights_s2[!(data_train_list_trim$series == 'series_2'), ] <- 0
data_train_list_trim$weights_s2 <- weights_s2
weights_s3[!(data_train_list_trim$series == 'series_3'), ] <- 0
data_train_list_trim$weights_s3 <- weights_s3
mod3 <- mvgam(y ~ -1,
trend_formula = ~
s(time, season, k = 20, bs = "fs") +
te(precip, lag, by = weights_s1, k = c(5, 5)) +
te(precip, lag, by = weights_s2, k = c(5, 5)) +
te(precip, lag, by = weights_s3, k = c(5, 5)),
data = data_train_list_trim,
family = poisson(),
trend_model = 'AR1')
#> Using cmdstanr as the backend
#>
#> ld: warning: duplicate -rpath '/Users/ree140/.cmdstan/cmdstan-2.33.1/stan/lib/stan_math/lib/tbb' ignored
#> Error: Missing input data for the following data variables: S_trend4.
Created on 2023-11-20 with reprex v2.0.2`
Thanks @matt-w-rees, the last patch should fix this now. For some reason the smooth penalty matrices didn't start indexing at 1
for that model like they usually do, so that tripped up my code. Should be fine now though
That did it, thanks!
Hey Nick,
I'm having a bit of trouble predicting with this model format.
Following on from the code I most recently posted, when i use:
newdata <- data_train_list_trim
preds <- predict(mod3, newdata = newdata, type = 'link')
I get "Error in trend_indicators[i] <- trend_map$trend[which(trend_map$series == : replacement has length zero".
Any ideas on how to handle this?
Thanks for pointing this out @matt-w-rees. I've just pushed a patch that should fix this for you. Let me know if you have any other issues though
Just getting back to this, thanks Nick. However, with the new package updates, I can longer successfully run 'mod3' in the code chunk above - the model fits, but then I get the error "Error in terms_include %in% names(trend_test) : object 'trend_test' not found".
Thanks @matt-w-rees for the heads-up. I made a small typo in a recent update so this is why the prediction wasn't working. Should all be fixed now. By the way, you can use algorithm = 'meanfield'
when testing out these larger models if you're keen for some dramatic speedups. I've found that this form of Variational Bayesian approximation tends to behave well for DGAMs
Hi Nick,
I am trying to fit a multivariate model with trend-specific distributed lag effects (e.g., ' te(precip, lag, by = trend, k = c(6, 6))'). However, this appears to be failing due to internal formatting errors in mvgam. I assume this is because 'trend' needs to be in the same matrix format as the two variables ('precip', 'lag' in the previous e.g.,), but when I try to do this, the model fits but errors right at the end with "Error in PredictMat(object$smooth[[k]], data) : Can't find by variable".
I've attached a reproducible example below, any help or ideas would be greatly appreciated!
Cheers, Matt.
Created on 2023-11-16 with reprex v2.0.2