Multivariate models with seperate distributed lag effects not working

matt-w-rees commented 10 months ago

Hi Nick,

I am trying to fit a multivariate model with trend-specific distributed lag effects (e.g., ' te(precip, lag, by = trend, k = c(6, 6))'). However, this appears to be failing due to internal formatting errors in mvgam. I assume this is because 'trend' needs to be in the same matrix format as the two variables ('precip', 'lag' in the previous e.g.,), but when I try to do this, the model fits but errors right at the end with "Error in PredictMat(object$smooth[[k]], data) : Can't find by variable".

I've attached a reproducible example below, any help or ideas would be greatly appreciated!

Cheers, Matt.

library(mvgam)
# Loading required package: mgcv
# Loading required package: nlme
# This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.
# Loading required package: Rcpp
# Loading required package: brms
# Loading 'brms' package (version 2.20.4). Useful instructions
# can be found by typing help('brms'). A more detailed introduction
# to the package is available through vignette('brms_overview').
# 
# Attaching package: 'brms'
# The following objects are masked from 'package:mgcv':
# 
#     s, t2
# The following object is masked from 'package:stats':
# 
#     ar
# Loading required package: marginaleffects
# Loading required package: insight
# Welcome to mvgam. Please cite as: Clark, NJ, and Wells, K. 2022. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 2022, https://doi.org/10.1111/2041-210X.13974

# Simon Wood’s lag matrix function 
lagard <- function(x, n.lag = 6) {
  n <- length(x); X <- matrix(NA, n, n.lag)
  for (i in 1:n.lag) X[i:n, i] <- x[i:n - i + 1]
  X
}

# simulate time series data at 3 sites
dat <- sim_mvgam(T = 100, n_series = 3, n_lv = 2,
                 family = 'poisson',
                 mu = runif(6, -1, 2),
                 trend_rel = 0.6)

# our data is
data_train <- dat$data_train

# add fake precip covariate
data_train$precip <- rnorm(nrow(data_train))

# make training data a list with lag
data_train_list <- list(lag=matrix(0:5, nrow(data_train), 6, byrow = TRUE),
                         y = data_train$y,
                         time = data_train$time,
                         series = data_train$series)

# calculate lagged values
data_train_list$precip <- lagard(data_train$precip)

# remove first 5 rows due to NA's (from lag)
data_train_list_trim <- lapply(data_train_list, tail, -6)

## fit SS model - same lag response across series 
mod1 <- mvgam(y ~ -1,
              trend_formula = ~ te(precip, lag, k = c(6, 6)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')
# Using cmdstanr as the backend
# 
# ld: warning: duplicate -rpath '/Users/ree140/.cmdstan/cmdstan-2.33.1/stan/lib/stan_math/lib/tbb' ignored
# Init values were only set for a subset of parameters. 
# Missing init values for the following parameters:
#  - chain 1: b_raw_trend, sigma, ar1, LV, lambda_trend
#  - chain 2: b_raw_trend, sigma, ar1, LV, lambda_trend
#  - chain 3: b_raw_trend, sigma, ar1, LV, lambda_trend
#  - chain 4: b_raw_trend, sigma, ar1, LV, lambda_trend
# Running MCMC with 4 parallel chains...
# 
# Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 1 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -4.44659e-323. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 1
# Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 1 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -2.63581e-20. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 1
# Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 8.67362e-18. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -2.0383e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 1.1293e-15. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 6.28641e+26. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -6.59195e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 3 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 3 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -1.29127e+20. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 3
# Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 3 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -5.53377e-16. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 3
# Chain 4 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 4 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 1.19726e+52. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 4
# Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
# Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
# Chain 3 Iteration: 100 / 1000 [ 10%]  (Warmup) 
# Chain 4 Iteration: 100 / 1000 [ 10%]  (Warmup)
# Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 4 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -4.16334e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d38251cb7.stan', line 85, column 2 to line 91, column 63)
# Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 4
# Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 4 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 3 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 4 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 3 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 4 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 3 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 4 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 4 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 3 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 3 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 4 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 4 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 3 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 4 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 3 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 4 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 4 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 4 finished in 3.3 seconds.
# Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 1 finished in 3.4 seconds.
# Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 3 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 2 finished in 3.5 seconds.
# Chain 3 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 3 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 3 finished in 4.4 seconds.
# 
# All 4 chains finished successfully.
# Mean chain execution time: 3.7 seconds.
# Total execution time: 4.5 seconds.
# Warning: 2 of 2000 (0.0%) transitions ended with a divergence.
# See https://mc-stan.org/misc/warnings for details.

# works fine

## now try to fit SS model - different lag responses across series 
mod2 <- mvgam(y ~ -1,
              trend_formula = ~ te(precip, lag, by = trend, k = c(6, 6)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')
# Error in ss_gam$smooth: $ operator is invalid for atomic vectors

# I assume this is because it require the 'trend' var to be in the same format as precip and lag (matrices), so make a new 'trend matrix' in the data 

# series in matrix format (note: I tried overwriting the current series column, that doesn't work)
data_train_list$ser_mat <- matrix(data_train$series, nrow(data_train), 24, byrow = FALSE)

# and need to trim again 
data_train_list_trim <- lapply(data_train_list, tail, -6)

# refit with new ser_mat column  
mod3 <- mvgam(y ~ -1,
              trend_formula = ~ te(precip, lag, by = ser_mat, k = c(6, 6)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')
# Using cmdstanr as the backend
# 
# ld: warning: duplicate -rpath '/Users/ree140/.cmdstan/cmdstan-2.33.1/stan/lib/stan_math/lib/tbb' ignored
# Init values were only set for a subset of parameters. 
# Missing init values for the following parameters:
#  - chain 1: b_raw_trend, sigma, ar1, LV, lambda_trend
#  - chain 2: b_raw_trend, sigma, ar1, LV, lambda_trend
#  - chain 3: b_raw_trend, sigma, ar1, LV, lambda_trend
#  - chain 4: b_raw_trend, sigma, ar1, LV, lambda_trend
# Running MCMC with 4 parallel chains...
# 
# Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 1 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 1.51788e-18. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 1
# Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 1 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -1.86483e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 1
# Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 1 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -5.77493e+60. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 1
# Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 1 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -3.55618e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 1
# Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -1.17285e-14. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 1.54564e-15. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -1.60694e+60. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 3.22409e-12. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 2 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 2 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 5.32907e-15. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 2 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 2 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 2
# Chain 3 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 3 Exception: multi_normal_prec_lpdf: Precision matrix is not symmetric. Precision matrix[1,19] = 1.6793e+144, but Precision matrix[19,1] = 1.6793e+144 (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 3
# Chain 4 Iteration:   1 / 1000 [  0%]  (Warmup)
# Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 4 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 2.60209e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 4
# Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 4 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is 1.35525e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 4
# Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 4 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -4.24092e-27. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 4
# Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
# Chain 4 Exception: multi_normal_prec_lpdf: LDLT_Factor of precision parameter is not positive definite.  last conditional variance is -3.90313e-17. (in '/var/folders/91/5zwgbxk952vd2qjj7rg9ky540000gp/T/RtmpeZkEjf/model-888d5ca9263a.stan', line 85, column 2 to line 91, column 63)
# Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
# Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
# Chain 4
# Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
# Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
# Chain 3 Iteration: 100 / 1000 [ 10%]  (Warmup) 
# Chain 4 Iteration: 100 / 1000 [ 10%]  (Warmup) 
# Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 4 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 3 Iteration: 200 / 1000 [ 20%]  (Warmup) 
# Chain 4 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 3 Iteration: 300 / 1000 [ 30%]  (Warmup) 
# Chain 4 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 3 Iteration: 400 / 1000 [ 40%]  (Warmup) 
# Chain 4 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 4 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 3 Iteration: 500 / 1000 [ 50%]  (Warmup) 
# Chain 3 Iteration: 501 / 1000 [ 50%]  (Sampling) 
# Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 4 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 3 Iteration: 600 / 1000 [ 60%]  (Sampling) 
# Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 3 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 4 Iteration: 700 / 1000 [ 70%]  (Sampling) 
# Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 3 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 4 Iteration: 800 / 1000 [ 80%]  (Sampling) 
# Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 1 finished in 3.6 seconds.
# Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 3 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 2 finished in 3.6 seconds.
# Chain 4 Iteration: 900 / 1000 [ 90%]  (Sampling) 
# Chain 3 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 3 finished in 4.0 seconds.
# Chain 4 Iteration: 1000 / 1000 [100%]  (Sampling) 
# Chain 4 finished in 4.2 seconds.
# 
# All 4 chains finished successfully.
# Mean chain execution time: 3.9 seconds.
# Total execution time: 4.3 seconds.
# Error in PredictMat(object$smooth[[k]], data): Can't find by variable

^{Created on 2023-11-16 with reprex v2.0.2}

nicholasjclark commented 10 months ago

Thanks for the reprex @matt-w-rees. These are challenging to implement, and unfortunately the mgcv documentation doesn't make it easy to learn how to do them. You are correct that the by variable needs to be a matrix, but it must be a numeric matrix. If you look at the help for linear functionals (?mgcv::linear.functional.terms and ?mgcv::gam.models), you will see that the by variable is used as a weighting matrix, and the resulting smooth is constructed as rowSums(L*F), where F is your distributed lag smooth and L is the weighting matrix. I spent a lot of time on this for a few recent projects, and finally realised that I needed to supply group-level distributed lag terms for each group in the formula (this is reasonably well-described here: https://github.com/nicholasjclark/portal_VAR. Below is some code that should work for you. I'd also recommend you reinstall mvgam as the most recent version does a better job of passing along error messages from mgcv:

library(mvgam)

lagard <- function(x, n.lag = 6) {
  n <- length(x); X <- matrix(NA, n, n.lag)
  for (i in 1:n.lag) X[i:n, i] <- x[i:n - i + 1]
  X
}

# simulate time series data at 3 sites
dat <- sim_mvgam(T = 100, n_series = 3, n_lv = 2,
                 family = 'poisson',
                 mu = runif(6, -1, 2),
                 trend_rel = 0.6)

# our data is
data_train <- dat$data_train

# add fake precip covariate
data_train$precip <- rnorm(nrow(data_train))

# make training data a list with lag
data_train_list <- list(lag=matrix(0:5, nrow(data_train), 6, byrow = TRUE),
                        y = data_train$y,
                        time = data_train$time,
                        series = data_train$series)

# calculate lagged values
data_train_list$precip <- lagard(data_train$precip)

# remove first 5 rows due to NA's (from lag)
data_train_list_trim <- lapply(data_train_list, tail, -6)

# fit SS model - same lag response across series 
mod1 <- mvgam(y ~ -1,
              trend_formula = ~ te(precip, lag, k = c(6, 6)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')

# now try to fit SS model - different lag responses across series 
# with the most recent version of {mvgam}, you will get a more informative error
# here telling you that factor 'by' variables can't be used with matrix covariates
mod2 <- mvgam(y ~ -1,
              trend_formula = ~ te(precip, lag, by = trend, k = c(6, 6)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')

# the best way I've found to do this, which is unforunately a bit cumbersone:
# create weight matrices for each level of the grouping facotr for setting up hierarchical 
# distributed lag terms
weights_s1 <- weights_s2 <- 
  weights_s3 <- matrix(1, ncol = ncol(data_train_list_trim$lag), 
                       nrow = nrow(data_train_list_trim$lag))

# for each series, the rows in its weighting matrix need to have '1s' when the corresponding
# observation in data matches that series, and '0s' otherwise
weights_s1[!(data_train_list_trim$series == 'series_1'), ] <- 0
data_train_list_trim$weights_s1 <- weights_s1

weights_s2[!(data_train_list_trim$series == 'series_2'), ] <- 0
data_train_list_trim$weights_s2 <- weights_s2

weights_s3[!(data_train_list_trim$series == 'series_3'), ] <- 0
data_train_list_trim$weights_s3 <- weights_s3

# independent series-level distributed lag terms  
mod3 <- mvgam(y ~ -1,
              trend_formula = ~ 
                te(precip, lag, by = weights_s1, k = c(5, 5)) +
                te(precip, lag, by = weights_s2, k = c(5, 5)) +
                te(precip, lag, by = weights_s3, k = c(5, 5)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')
summary(mod3)
plot(mod3, type = 'smooths', trend_effects = TRUE)

# unfortunately plot_predictions() and friends will fail because {marginaleffects}
# only currently works with data.frames, not lists (I'm working on this)
conditional_effects(mod3)

# alternatively you can set these up as a hierarchical GAM, where each series
# shares a 'global' distributed lag term and then is allowed to deviate from
# that term
mod3 <- mvgam(y ~ -1,
              trend_formula = ~ 

                # the shared distributed lag term
                te(precip, lag,k = c(6, 6)) +

                # series-level deviation terms
                te(precip, lag, by = weights_s1, k = c(5, 5)) +
                te(precip, lag, by = weights_s2, k = c(5, 5)) +
                te(precip, lag, by = weights_s3, k = c(5, 5)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')

matt-w-rees commented 10 months ago

Thanks so much for the quick response and detailed explanation Nick -- that all makes sense and the solution worked -- I love my new lag terms!

nicholasjclark commented 10 months ago

Great to hear @matt-w-rees. And please let me know if there is anything else I can help with, I'm keen to develop the package further but it would be good to have some more targeted directions so it can be useful to people

matt-w-rees commented 10 months ago

Okay glad to hear Nick, because I am running into further errors with this model structure. For example, I want to also include an effect of season, but allow this to vary over time. I've updated the reprex to include a season covariate and factor smooth term for season / time below, but the model errors with 'Missing input data for the following data variables: S_trend4.'. Note this model works with just the factor smooth, so I am assuming it is related to the lag effects.

library(mvgam)
#> Loading required package: mgcv
#> Loading required package: nlme
#> This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.
#> Loading required package: Rcpp
#> Loading required package: brms
#> Loading 'brms' package (version 2.20.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following objects are masked from 'package:mgcv':
#> 
#>     s, t2
#> The following object is masked from 'package:stats':
#> 
#>     ar
#> Loading required package: marginaleffects
#> Loading required package: insight
#> Welcome to mvgam. Please cite as: Clark, NJ, and Wells, K. 2022. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 2022, https://doi.org/10.1111/2041-210X.13974

lagard <- function(x, n.lag = 6) {
  n <- length(x); X <- matrix(NA, n, n.lag)
  for (i in 1:n.lag) X[i:n, i] <- x[i:n - i + 1]
  X
}

# simulate time series data at 3 sites
dat <- sim_mvgam(T = 100, n_series = 3, n_lv = 2,
                 family = 'poisson',
                 mu = runif(6, -1, 2),
                 trend_rel = 0.6)

# our data is
data_train <- dat$data_train

# add fake precip covariate
data_train$precip <- rnorm(nrow(data_train))

# add fake season covariate too
data_train$season <- c(rep(c("spring", "summer", "autumn", "winter"), nrow(data_train) / 4), "spring")

# make training data a list with lag
data_train_list <- list(lag=matrix(0:5, nrow(data_train), 6, byrow = TRUE),
                        y = data_train$y,
                        season = as.factor(data_train$season),
                        time = data_train$time,
                        series = data_train$series)

# calculate lagged values
data_train_list$precip <- lagard(data_train$precip)

# remove first 5 rows due to NA's (from lag)
data_train_list_trim <- lapply(data_train_list, tail, -6)

# the best way I've found to do this, which is unforunately a bit cumbersone:
# create weight matrices for each level of the grouping facotr for setting up hierarchical 
# distributed lag terms
weights_s1 <- weights_s2 <- 
  weights_s3 <- matrix(1, ncol = ncol(data_train_list_trim$lag), 
                       nrow = nrow(data_train_list_trim$lag))

# for each series, the rows in its weighting matrix need to have '1s' when the corresponding
# observation in data matches that series, and '0s' otherwise
weights_s1[!(data_train_list_trim$series == 'series_1'), ] <- 0
data_train_list_trim$weights_s1 <- weights_s1

weights_s2[!(data_train_list_trim$series == 'series_2'), ] <- 0
data_train_list_trim$weights_s2 <- weights_s2

weights_s3[!(data_train_list_trim$series == 'series_3'), ] <- 0
data_train_list_trim$weights_s3 <- weights_s3

mod3 <- mvgam(y ~ -1,
              trend_formula = ~ 
                s(time, season, k = 20, bs = "fs") + 
                te(precip, lag, by = weights_s1, k = c(5, 5)) +
                te(precip, lag, by = weights_s2, k = c(5, 5)) +
                te(precip, lag, by = weights_s3, k = c(5, 5)),
              data = data_train_list_trim,
              family = poisson(),
              trend_model = 'AR1')
#> Using cmdstanr as the backend
#> 
#> ld: warning: duplicate -rpath '/Users/ree140/.cmdstan/cmdstan-2.33.1/stan/lib/stan_math/lib/tbb' ignored
#> Error: Missing input data for the following data variables: S_trend4.

^{Created on 2023-11-20 with reprex v2.0.2}`

nicholasjclark commented 10 months ago

Thanks @matt-w-rees, the last patch should fix this now. For some reason the smooth penalty matrices didn't start indexing at 1 for that model like they usually do, so that tripped up my code. Should be fine now though

matt-w-rees commented 10 months ago

That did it, thanks!

matt-w-rees commented 9 months ago

Hey Nick,

I'm having a bit of trouble predicting with this model format.

Following on from the code I most recently posted, when i use:

newdata <- data_train_list_trim
preds <- predict(mod3, newdata = newdata, type = 'link')

I get "Error in trend_indicators[i] <- trend_map$trend[which(trend_map$series == : replacement has length zero".

Any ideas on how to handle this?

nicholasjclark commented 9 months ago

Thanks for pointing this out @matt-w-rees. I've just pushed a patch that should fix this for you. Let me know if you have any other issues though

matt-w-rees commented 8 months ago

Just getting back to this, thanks Nick. However, with the new package updates, I can longer successfully run 'mod3' in the code chunk above - the model fits, but then I get the error "Error in terms_include %in% names(trend_test) : object 'trend_test' not found".

nicholasjclark commented 8 months ago

Thanks @matt-w-rees for the heads-up. I made a small typo in a recent update so this is why the prediction wasn't working. Should all be fixed now. By the way, you can use algorithm = 'meanfield' when testing out these larger models if you're keen for some dramatic speedups. I've found that this form of Variational Bayesian approximation tends to behave well for DGAMs

nicholasjclark / mvgam

Multivariate models with seperate distributed lag effects not working #32