Closed roaldarbol closed 2 months ago
As an aside, would there be any way of knowing whether such a discrete event would have a delayed effect? Say, lights turn on and off for a 5 minute period, but the effect of that event plays out over the following 1 hour, despite the light no longer being on.
Hi @roaldarbol, thanks for the question and the interesting example. For this kind of problem, I'd probably ignore the smooth of time
and just focus on the different structural changes that are happening. I've found a lot of use for including predictors that represent the time since some event happened. In this case, including a measure of the time since entering the "b" group can be an effective way to investigate the delayed effect of the "b" condition. Of course if there are consistent temporal dynamics that you'd need to deal with, you could include those as well. But I've left them out here to keep the point simple:
library(mvgam)
library(marginaleffects)
library(dplyr)
library(ggplot2); theme_set(theme_bw())
# Simulate data
t <- seq(from = 0, to = 20, length.out = 1000)
v1 <- rep(1, 500)
v2 <- rep(2, 500)
noise <- rnorm(1000, mean = 0.5, sd = 1)
values <- c(v1,v2) + noise
a <- rep("a", 500)
b <- rep("b", 500)
df <- data.frame(
time = t,
value = values,
# Don't use the name 'group' as it causes problems for marginaleffects
grp = c(a,b)
) |>
mutate(value = if_else(time > 15, value - ((time-14)^1.2), value)) |>
# Add a time since the start of condition 'b' as a new variable
mutate(time_since_b = c(rep(0, 500),
1:500))
df <- bind_rows(df, df, df, df, df) |>
mutate(time = row_number(),
season = as.integer(time / 1000),
series = factor("A"))
# Fit a model that estimates the nonlinear, delayed impact of entering
# condition 'b'
mod <- mvgam(value ~ grp +
s(time_since_b, k = 15),
data = df,
family = gaussian(),
# Meanfield Variation inference should be just fine for this model,
# and will be lightning fast
algorithm = 'meanfield',
backend = 'cmdstanr')
#> Compiling Stan program using cmdstanr
#>
#> Start sampling
#> ------------------------------------------------------------
#> EXPERIMENTAL ALGORITHM:
#> This procedure has not been thoroughly tested and may be unstable
#> or buggy. The interface is subject to change.
#> ------------------------------------------------------------
#> Gradient evaluation took 0.000375 seconds
#> 1000 transitions using 10 leapfrog steps per transition would take 3.75 seconds.
#> Adjust your expectations accordingly!
#> Begin eta adaptation.
#> Iteration: 1 / 250 [ 0%] (Adaptation)
#> Iteration: 50 / 250 [ 20%] (Adaptation)
#> Iteration: 100 / 250 [ 40%] (Adaptation)
#> Iteration: 150 / 250 [ 60%] (Adaptation)
#> Iteration: 200 / 250 [ 80%] (Adaptation)
#> Success! Found best value [eta = 1] earlier than expected.
#> Begin stochastic gradient ascent.
#> iter ELBO delta_ELBO_mean delta_ELBO_med notes
#> 100 -623427.781 1.000 1.000
#> 200 -386878.079 0.806 1.000
#> 300 -320377.370 0.606 0.611
#> 400 -9371.998 8.751 1.000
#> 500 -7400.788 7.054 0.611
#> 600 -7129.433 5.885 0.611
#> 700 -7051.640 5.046 0.266
#> 800 -7060.191 4.415 0.266
#> 900 -7063.776 3.925 0.208
#> 1000 -7053.622 3.532 0.208
#> 1100 -7019.789 3.433 0.038 MAY BE DIVERGING... INSPECT ELBO
#> 1200 -7045.633 3.372 0.011 MAY BE DIVERGING... INSPECT ELBO
#> 1300 -7126.481 3.352 0.011 MAY BE DIVERGING... INSPECT ELBO
#> 1400 -7040.557 0.035 0.011
#> 1500 -7017.499 0.009 0.005 MEAN ELBO CONVERGED MEDIAN ELBO CONVERGED
#> Drawing a sample of size 500 from the approximate posterior...
#> COMPLETED.
#> Finished in 9.3 seconds.
summary(mod, include_betas = FALSE)
#> GAM formula:
#> value ~ grp + s(time_since_b, k = 15)
#>
#> Family:
#> gaussian
#>
#> Link function:
#> identity
#>
#> Trend model:
#> None
#>
#> N series:
#> 1
#>
#> N timepoints:
#> 5000
#>
#> Status:
#> Fitted using Stan
#> 1 chains, each with iter = 500; warmup = ; thin = 1
#> Total post-warmup draws = 500
#>
#>
#> Observation error parameter estimates:
#> 2.5% 50% 97.5% Rhat n.eff
#> sigma_obs[1] 0.95 0.98 1 NaN NaN
#>
#> GAM coefficient (beta) estimates:
#> 2.5% 50% 97.5% Rhat n.eff
#> (Intercept) 0.34 0.37 0.39 NaN NaN
#> grpb 0.96 1.00 1.00 NaN NaN
#>
#> Approximate significance of GAM smooths:
#> edf Ref.df Chi.sq p-value
#> s(time_since_b) 9.94 14 17780 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Posterior approximation used: no diagnostics to compute
# Plot the delayed effect of entering the 'b' condition
plot_predictions(mod, by = c("time_since_b"),
newdata = datagrid(time_since_b = unique,
group = 'b'),
points = 0.2)
# Plot an estimate of the structural change when first entering condition 'b'
plot_predictions(mod, by = 'grp',
newdata = datagrid(time_since_b = 0,
group = unique))
# Plot full predictions
fc <- hindcast(mod)
plot(fc)
Created on 2024-06-27 with reprex v2.0.2
Thanks for the reply! That makes a lot of sense, and I think it's a nice way to think about it. One of questions that are at the crux of this analysis is actually exactly whether the effect of the light is the presence or the onset of the light that has an effect (or possibly a mixture of the two). How would you go about that? I guess I could include both make three models, one with with a time-since-onset variable (like the one you fitted), one with a discrete effect (which I'm then not quite sure how I would fit) and one with both of them - and then compare them?
Sorry for the late reply. Yes I think that strategy sounds useful. The discrete effect would just be the same model but without the time since onset effect, so should be fairly simple
Hi Nicholas!
Just starting my adventures into these types of models, so I hope it's alright to post a question here (the Discussion isn't open, so can't ask there). Otherwise, just point me somewhere else. :-)
I'm wondering if it is possible to add effects of factors in a meaningful way? E.g. if I'm activity of glowworms, and I have periods of time where street lights are on, and periods where the are off - can I model that meaningfully?
As an example (not using count data, but just gaussian data that are divided in two groups), here's a time series that's cyclic and irregular:
Let's say that there are two "conditions" (here "group"), that could be the binary environmental variable, such as artificial light being on or off.
I've tried to model it with:
But the effects seem strange, particularly for "a" (red), where the raw values are just linear with some noise:
Here's a reprex (although
plot_mvgam_series()
seems to crashreprex::reprex()
, so almost):Hope it makes sense! Watched the talk that's on Youtube, and am really impressed by the package and hoping I can leverage it in some more complex animal behaviour settings! :-)