nicholasjclark / mvgam

{mvgam} R 📦 to fit Dynamic Bayesian Generalized Additive Models for time series analysis and forecasting
https://nicholasjclark.github.io/mvgam/
Other
99 stars 12 forks source link

Some vignettes fail, not being able to find functions: `Error: could not find function "gam"`, `Error: could not find function "plot_predictions"` #68

Closed barracuda156 closed 3 weeks ago

barracuda156 commented 4 weeks ago
> NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), 
+     "true")

> knitr::opts_chunk$set(collapse = TRUE, comment = "#>", 
+     purl = NOT_CRAN, eval = NOT_CRAN)

> knitr::opts_chunk$set(echo = TRUE, dpi = 100, fig.asp = 0.8, 
+     fig.width = 6, out.width = "60%", fig.align = "center")

> library(mvgam)
Loading required package: brms
Loading required package: Rcpp
Loading 'brms' package (version 2.21.6). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

Attaching package: ‘brms’

The following object is masked from ‘package:stats’:

    ar

Welcome to mvgam. Please cite as: Clark, NJ, and Wells, K. 2022. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 2022, https://doi.org/10.1111/2041-210X.13974

> library(ggplot2)

> theme_set(theme_bw(base_size = 12, base_family = "serif"))

> simdat <- sim_mvgam(n_series = 4, T = 24, prop_missing = 0.2)

> head(simdat$data_train, 16)
    y season year   series time
1  10      1    1 series_1    1
2   1      1    1 series_2    1
3   4      1    1 series_3    1
4  NA      1    1 series_4    1
5  NA      2    1 series_1    2
6   3      2    1 series_2    2
7   1      2    1 series_3    2
8   1      2    1 series_4    2
9   6      3    1 series_1    3
10  1      3    1 series_2    3
11  5      3    1 series_3    3
12  0      3    1 series_4    3
13  1      4    1 series_1    4
14  1      4    1 series_2    4
15  2      4    1 series_3    4
16  0      4    1 series_4    4

> class(simdat$data_train$series)
[1] "factor"

> levels(simdat$data_train$series)
[1] "series_1" "series_2" "series_3" "series_4"

> all(levels(simdat$data_train$series) %in% unique(simdat$data_train$series))
[1] TRUE

> summary(glm(y ~ series + time, data = simdat$data_train, 
+     family = poisson()))

Call:
glm(formula = y ~ series + time, family = poisson(), data = simdat$data_train)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     1.52994    0.22552   6.784 1.17e-11 ***
seriesseries_2 -1.08070    0.28512  -3.790  0.00015 ***
seriesseries_3 -0.27030    0.21633  -1.249  0.21150    
seriesseries_4 -2.48382    0.52095  -4.768 1.86e-06 ***
time           -0.03199    0.01810  -1.767  0.07717 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 144.226  on 56  degrees of freedom
Residual deviance:  90.238  on 52  degrees of freedom
  (15 observations deleted due to missingness)
AIC: 204.38

Number of Fisher Scoring iterations: 6

> summary(gam(y ~ series + s(time, by = series), data = simdat$data_train, 
+     family = poisson()))

  When sourcing ‘data_in_mvgam.R’:
Error: could not find function "gam"
Execution halted

> NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), 
+     "true")

> knitr::opts_chunk$set(collapse = TRUE, comment = "#>", 
+     purl = NOT_CRAN, eval = NOT_CRAN)

> knitr::opts_chunk$set(echo = TRUE, dpi = 100, fig.asp = 0.8, 
+     fig.width = 6, out.width = "60%", fig.align = "center")

> library(mvgam)
Loading required package: brms
Loading required package: Rcpp
Loading 'brms' package (version 2.21.6). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

Attaching package: ‘brms’

The following object is masked from ‘package:stats’:

    ar

Welcome to mvgam. Please cite as: Clark, NJ, and Wells, K. 2022. Dynamic Generalized Additive Models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 2022, https://doi.org/10.1111/2041-210X.13974

> library(ggplot2)

> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> theme_set(theme_classic(base_size = 12, base_family = "serif") + 
+     theme(axis.line.x.bottom = element_line(colour = "black", 
+         size =  .... [TRUNCATED] 
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

> options(ggplot2.discrete.colour = c("#A25050", "#00008b", 
+     "darkred", "#010048"), ggplot2.discrete.fill = c("#A25050", 
+     "#00008b", "dark ..." ... [TRUNCATED] 

> set.seed(999)

> testdat <- data.frame(site = 1, replicate = rep(1:5, 
+     6), time = sort(rep(1:6, 5)), species = "sp_1", truth = c(rep(28, 
+     5), rep(26, 5), .... [TRUNCATED] 

> testdat = testdat %>% dplyr::bind_rows(data.frame(site = 1, 
+     replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = "sp_2", 
+     truth .... [TRUNCATED] 

> testdat$species <- factor(testdat$species, levels = unique(testdat$species))

> testdat$series <- factor(testdat$series, levels = unique(testdat$series))

> dplyr::glimpse(testdat)
Rows: 60
Columns: 7
$ site    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ time    <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5,…
$ species <fct> sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp_1, sp…
$ truth   <dbl> 28, 28, 28, 28, 28, 26, 26, 26, 26, 26, 23, 23, 23, 23, 23, 16…
$ obs     <int> 20, 19, 23, 17, 18, 21, 18, 21, 19, 18, 17, 16, 20, 11, 19, 9,…
$ series  <fct> site_1_sp_1_rep_1, site_1_sp_1_rep_2, site_1_sp_1_rep_3, site_…
$ cap     <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 10…

> head(testdat, 12)
   site time species truth obs            series cap
1     1    1    sp_1    28  20 site_1_sp_1_rep_1 100
2     1    1    sp_1    28  19 site_1_sp_1_rep_2 100
3     1    1    sp_1    28  23 site_1_sp_1_rep_3 100
4     1    1    sp_1    28  17 site_1_sp_1_rep_4 100
5     1    1    sp_1    28  18 site_1_sp_1_rep_5 100
6     1    2    sp_1    26  21 site_1_sp_1_rep_1 100
7     1    2    sp_1    26  18 site_1_sp_1_rep_2 100
8     1    2    sp_1    26  21 site_1_sp_1_rep_3 100
9     1    2    sp_1    26  19 site_1_sp_1_rep_4 100
10    1    2    sp_1    26  18 site_1_sp_1_rep_5 100
11    1    3    sp_1    23  17 site_1_sp_1_rep_1 100
12    1    3    sp_1    23  16 site_1_sp_1_rep_2 100

> trend_map <- testdat %>% dplyr::mutate(trend = as.numeric(factor(paste0(site, 
+     species)))) %>% dplyr::select(trend, series) %>% dplyr::distinc .... [TRUNCATED] 

> trend_map
   trend            series
1      1 site_1_sp_1_rep_1
2      1 site_1_sp_1_rep_2
3      1 site_1_sp_1_rep_3
4      1 site_1_sp_1_rep_4
5      1 site_1_sp_1_rep_5
6      2 site_1_sp_2_rep_1
7      2 site_1_sp_2_rep_2
8      2 site_1_sp_2_rep_3
9      2 site_1_sp_2_rep_4
10     2 site_1_sp_2_rep_5

> mod <- mvgam(formula = obs ~ species - 1, trend_formula = ~s(time, 
+     by = trend, k = 4) + species, trend_map = trend_map, family = nmix(), 
+   .... [TRUNCATED] 
Your model may benefit from using "noncentred = TRUE"
Compiling Stan program using rstan

Start sampling

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 1: 
Chain 1: Gradient evaluation took 0.035813 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 358.13 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 2: 
Chain 2: Gradient evaluation took 0.014611 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 146.11 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 3: 
Chain 3: Gradient evaluation took 0.015627 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 156.27 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 217.332 seconds (Warm-up)
Chain 3:                143.543 seconds (Sampling)
Chain 3:                360.875 seconds (Total)
Chain 3: 
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.010547 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 105.47 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 233.166 seconds (Warm-up)
Chain 2:                137.653 seconds (Sampling)
Chain 2:                370.819 seconds (Total)
Chain 2: 
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 259.431 seconds (Warm-up)
Chain 1:                121.425 seconds (Sampling)
Chain 1:                380.856 seconds (Total)
Chain 1: 
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 178.86 seconds (Warm-up)
Chain 4:                76.877 seconds (Sampling)
Chain 4:                255.737 seconds (Total)
Chain 4: 
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

> code(mod)
// Stan model code generated by package mvgam
data {
int<lower=0> total_obs; // total number of observations
int<lower=0> n; // number of timepoints per series
int<lower=0> n_sp_trend; // number of trend smoothing parameters
int<lower=0> n_lv; // number of dynamic factors
int<lower=0> n_series; // number of series
matrix[n_series, n_lv] Z; // matrix mapping series to latent states
int<lower=0> num_basis; // total number of basis coefficients
int<lower=0> num_basis_trend; // number of trend basis coefficients
vector[num_basis_trend] zero_trend; // prior locations for trend basis coefficients
matrix[total_obs, num_basis] X; // mgcv GAM design matrix
matrix[n * n_lv, num_basis_trend] X_trend; // trend model design matrix
array[n, n_series] int<lower=0> ytimes;  // time-ordered matrix (which col in X belongs to each [time, series] observation?)
array[n, n_lv] int ytimes_trend;
int<lower=0> n_nonmissing; // number of nonmissing observations
array[total_obs] int<lower=0> cap; // upper limits of latent abundances
array[total_obs] int ytimes_array; // sorted ytimes
array[n, n_series] int<lower=0> ytimes_pred; // time-ordered matrix for prediction
int<lower=0> K_groups; // number of unique replicated observations
int<lower=0> K_reps; // maximum number of replicate observations
array[K_groups] int<lower=0> K_starts; // col of K_inds where each group starts
array[K_groups] int<lower=0> K_stops; // col of K_inds where each group ends
array[K_groups, K_reps] int<lower=0> K_inds; // indices of replicated observations
matrix[3,6] S_trend1; // mgcv smooth penalty matrix S_trend1
matrix[3,6] S_trend2; // mgcv smooth penalty matrix S_trend2
array[total_obs] int<lower=0> flat_ys; // flattened observations
}
transformed data {
matrix[total_obs, num_basis] X_ordered = X[ytimes_array,  : ];
array[K_groups] int<lower=0> Y_max;
array[K_groups] int<lower=0> N_max;
for ( k in 1 : K_groups ) {
Y_max[k] = max(flat_ys[K_inds[k, K_starts[k] : K_stops[k]]]);
N_max[k] = max(cap[K_inds[k, K_starts[k] : K_stops[k]]]);
}
}
parameters {
// raw basis coefficients
vector[num_basis] b_raw;
vector[num_basis_trend] b_raw_trend;
// smoothing parameters
vector<lower=0>[n_sp_trend] lambda_trend;
}
transformed parameters {
// detection probability
vector[total_obs] p;
// latent states
matrix[n, n_lv] LV;
// latent states and loading matrix
vector[n * n_lv] trend_mus;
matrix[n, n_series] trend;
// basis coefficients
vector[num_basis] b;
vector[num_basis_trend] b_trend;
// observation model basis coefficients
b[1:num_basis] = b_raw[1:num_basis];
// process model basis coefficients
b_trend[1:num_basis_trend] = b_raw_trend[1:num_basis_trend];
// detection probability
p = X_ordered * b;
// latent process linear predictors
trend_mus = X_trend * b_trend;
for(j in 1:n_lv){
LV[1:n, j] = trend_mus[ytimes_trend[1:n, j]];
}
// derived latent states
for (i in 1:n){
for (s in 1:n_series){
trend[i, s] = dot_product(Z[s,], LV[i,]);
}
}
}
model {
// prior for speciessp_1...
b_raw[1] ~ std_normal();
// prior for speciessp_2...
b_raw[2] ~ std_normal();
// dynamic process models
// prior for (Intercept)_trend...
b_raw_trend[1] ~ normal(1, 1.5);
// prior for speciessp_2_trend...
b_raw_trend[2] ~ std_normal();
// prior for s(time):trendtrend1_trend...
b_raw_trend[3:5] ~ multi_normal_prec(zero_trend[3:5],S_trend1[1:3,1:3] * lambda_trend[1] + S_trend1[1:3,4:6] * lambda_trend[2]);
// prior for s(time):trendtrend2_trend...
b_raw_trend[6:8] ~ multi_normal_prec(zero_trend[6:8],S_trend2[1:3,1:3] * lambda_trend[3] + S_trend2[1:3,4:6] * lambda_trend[4]);
lambda_trend ~ normal(5, 30);
{
// likelihood functions
array[total_obs] real flat_trends;
array[total_obs] real flat_ps;
flat_trends = (to_array_1d(trend));
flat_ps = to_array_1d(p);
// loop over replicate sampling window (each site*time*species combination)
for ( k in 1 : K_groups ) {
// all log_lambdas are identical because they represent site*time
// covariates; so just use the first measurement
real log_lambda = flat_trends[K_inds[k, 1]];
vector[N_max[k] - Y_max[k] + 1] terms;
int l = 0;
// marginalize over latent abundance
for ( Ni in Y_max[k] : N_max[k] ) {
l = l + 1;
// factor for poisson prob of latent Ni; compute
// only once per sampling window
terms[l] = poisson_log_lpmf( Ni | log_lambda ) +
// for each replicate observation, binomial prob observed is
// computed in a vectorized statement
binomial_logit_lpmf( flat_ys[K_inds[k, K_starts[k] : K_stops[k]]] |
Ni,
flat_ps[K_inds[k, K_starts[k] : K_stops[k]]] );
}
target += log_sum_exp( terms );
}
}
}
generated quantities {
vector[n_lv] penalty = rep_vector(1e12, n_lv);
vector[n_sp_trend] rho_trend = log(lambda_trend);
}

> summary(mod)
GAM observation formula:
obs ~ species - 1

GAM process formula:
~s(time, by = trend, k = 4) + species

Family:
nmix

Link function:
log

Trend model:
None

N process models:
2 

N series:
10 

N timepoints:
6 

Status:
Fitted using Stan 
4 chains, each with iter = 1000; warmup = 500; thin = 1 
Total post-warmup draws = 2000

GAM observation model coefficient (beta) estimates:
             2.5%   50% 97.5% Rhat n_eff
speciessp_1 -0.35 0.730  1.40 1.01   710
speciessp_2 -1.20 0.015  0.89 1.00  1174

GAM process model coefficient (beta) estimates:
                              2.5%    50%  97.5% Rhat n_eff
(Intercept)_trend            2.700  3.000  3.500 1.01   655
speciessp_2_trend           -1.200 -0.620  0.130 1.01   849
s(time):trendtrend1.1_trend -0.081  0.016  0.230 1.00   374
s(time):trendtrend1.2_trend -0.270  0.011  0.290 1.00  1022
s(time):trendtrend1.3_trend -0.450 -0.250 -0.015 1.00   677
s(time):trendtrend2.1_trend -0.300 -0.014  0.091 1.01   176
s(time):trendtrend2.2_trend -0.180  0.033  0.620 1.01   268
s(time):trendtrend2.3_trend  0.035  0.330  0.610 1.00   948

Approximate significance of GAM process smooths:
                      edf Ref.df Chi.sq p-value  
s(time):seriestrend1 1.82      3   4.02   0.036 *
s(time):seriestrend2 1.46      3   3.71   0.239  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Stan MCMC diagnostics:
n_eff / iter looks reasonable for all parameters
Rhat looks reasonable for all parameters
0 of 2000 iterations ended with a divergence (0%)
0 of 2000 iterations saturated the maximum tree depth of 12 (0%)
E-FMI indicated no pathological behavior

Samples were drawn using NUTS(diag_e) at Fri Aug 23 21:33:58 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split MCMC chains
(at convergence, Rhat = 1)

> loo(mod)
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

Computed from 2000 by 60 log-likelihood matrix.

         Estimate   SE
elpd_loo   -227.3 12.4
p_loo        80.5 11.2
looic       454.6 24.8
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.1]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     30    50.0%   570     
   (0.7, 1]   (bad)       1     1.7%   <NA>    
   (1, Inf)   (very bad) 29    48.3%   <NA>    
See help('pareto-k-diagnostic') for details.

> plot(mod, type = "smooths", trend_effects = TRUE)

> plot_predictions(mod, condition = "species", type = "detection") + 
+     ylab("Pr(detection)") + ylim(c(0, 1)) + theme_classic() + 
+     theme(leg .... [TRUNCATED] 

  When sourcing ‘nmixtures.R’:
Error: could not find function "plot_predictions"
Execution halted
nicholasjclark commented 4 weeks ago

Thanks @barracuda156. This should be addressed now on the Github version of the package (i.e. https://github.com/nicholasjclark/mvgam/commit/57ce61f29aefcbf6a56f1d7501da4f1ae62bf612#diff-964f0c7c4de942e53f4dee17e4d7e19efbaa4d0edaa803a99953e6e5375261f4)