Closed paul-buerkner closed 5 years ago
I tried to make it more useful but wasn't successful. I guess it's really just the best to remove it. If, at some point, we get a reasonable idea, I may as well implement it from scratch.
I have to say that I find it a very useful "convenience function" -- it's easier than having to fiddle around with the data frame (so I would prefer it not to be deprecated). But I am interested to understand in what way you you were thinking of making it more useful.
I understand that you find it convenient, but I think that (a) it's in the wrong place (i.e. it shouldn't be passed to autocor
) and (b) it is not unlikely to yield non-sensible results. I thought it could be helpful for models where we don't have ARMA models (so bascially everything else than gaussian models), but now that I have tried again (and failed), I am even more motivated to just remove it.
I am interested to learn how it might lead to non-sensible results.
This may happen if the mean of time-series is far away from non-zero. There are "quick fixes" for that but no principled ones I like. Here is an example:
N <- length(LakeHuron)
df <- data.frame(
y = as.numeric(LakeHuron),
year = as.numeric(time(LakeHuron)),
time = 1:N
)
library(ggplot2)
library(brms)
fit <- brm(
y ~ 1,
data = df,
autocor = cor_arr(~time, r = 2),
)
preds <- posterior_predict(fit)
preds <- cbind(
Estimate = colMeans(preds),
Q5 = apply(preds, 2, quantile, probs = 0.05),
Q95 = apply(preds, 2, quantile, probs = 0.95)
)
ggplot(cbind(df, preds), aes(x = year, y = Estimate)) +
geom_smooth(aes(ymin = Q5, ymax = Q95), stat = "identity", size = 0.5) +
geom_point(aes(y = y))
Bottom line is, I implemented this correlation structure ages ago and it was simply a bad idea.
Interesting, thank you.
This may happen if the mean of time-series is far away from non-zero. There are "quick fixes" for that but no principled ones I like. Here is an example:
N <- length(LakeHuron) df <- data.frame( y = as.numeric(LakeHuron), year = as.numeric(time(LakeHuron)), time = 1:N ) library(ggplot2) library(brms) fit <- brm( y ~ 1, data = df, autocor = cor_arr(~time, r = 2), ) preds <- posterior_predict(fit) preds <- cbind( Estimate = colMeans(preds), Q5 = apply(preds, 2, quantile, probs = 0.05), Q95 = apply(preds, 2, quantile, probs = 0.95) ) ggplot(cbind(df, preds), aes(x = year, y = Estimate)) + geom_smooth(aes(ymin = Q5, ymax = Q95), stat = "identity", size = 0.5) + geom_point(aes(y = y))
Bottom line is, I implemented this correlation structure ages ago and it was simply a bad idea.
I'm not sure if this is helpful to find a principled solution, but if you fit the LakeHuron data with INLA (continuing from the quoted code above):
library(INLA)
family <- "gaussian"
formula1 <- y~ f(time, model = 'ar1')
res1 <- inla(formula = formula1,
data = df,
family = family)
plot(res1$summary.random$t[ ,"mean"]+res1$summary.fixed$mean[1],
ylab="fitting result",
type="l")
points(df$y, col="blue")
the fitted line obtained by the brms
model corresponds to the res1$summary.fixed$mean[1]
part (= the intercept). The
res1$summary.random$t[ ,"mean"]
part is the deviation from the intercept (the AR1 part). The INLA model is overfitted, which can be remedied by fixing one of the hyperparameters or using PC priors: see https://haakonbakka.bitbucket.io/btopic115.html. See also inla.doc("ar1")
for details of the way it is implemented in INLA.
Given the usefulness of these autoregression-of-the-response models especially for non-gaussian data, I'd vote :+1: to keep or re-implement this functionality in brms
and hope that a principled approach for the above problematic case can be found.
Thanks for following up on this. I am not sure I fully understand the above INLA example. Does AR1 not mean AR1 of residuals (available via cor_ar
) or is it really of the AR1 of the response in INLA? Based on the doc it seems to be equivalent to cor_ar
in brms. It is just written down for a simple example where both are equivalent.
In the example section they give an example with family = "poisson". Because the Poisson family has no residuals, I was let to believe it is autoregression of the response. But I may well misunderstand something, because in the beginning of inla.doc("ar1") they indeed speak of a Gaussian vector.
Here is the example they give :
library(INLA)
#simulate data
n = 100
rho = 0.8
prec = 10
## note that the marginal precision would be
marg.prec = prec * (1-rho^2)
E=sample(c(5,4,10,12),size=n,replace=T)
eta = as.vector(arima.sim(list(order = c(1,0,0), ar = rho), n = n,sd=sqrt(1/prec)))
y=rpois(n,E*exp(eta))
data = list(y=y, z=1:n, E=E)
## fit the model
formula = y~f(z,model="ar1")
res1 = inla(formula,family="poisson", data = data, E=E)
Just talked to @ThierryO about this. Apparently the INLA ar1
is neither autoregression of residuals, nor autoregression of the respons. The autoregression is at the level of the random effect parameters. In INLA, an ar1
random effect for time, imposes correlation between discrete time periods with time periods further apart less correlated. If the correlation parameter is estimated to be zero, the ar1
-type random effect reduces to an iid
random effect (i.e. (1|time)
).
Hope this clarifies it...
It does clarify this and basically tells use where to implement such a feature (i.e., in the multilevel structure of brms). Thanks a lot!
Maybe a related approach would also allow autoregression for Bernoulli and cumulative distributions?: https://cran.r-project.org/web/packages/mvord/vignettes/vignette_mvord.pdf
Indeed that could work although residuals in the binary case are not going to be identified. for all other families without a built-in dispersion parameter this should work
Andrew Milne notifications@github.com schrieb am Fr., 19. Apr. 2019, 15:01:
Maybe a related approach would also allow autoregression for Bernoulli and cumulative distributions?: https://cran.r-project.org/web/packages/mvord/vignettes/vignette_mvord.pdf
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/631#issuecomment-484888808, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCW2AEWT7RIKHYEEPYAK3TPRG7DNANCNFSM4HD4RNJA .
The
cor_arr
correlation structure is currently deprecated because it does not really do something useful that one cannot otherwise do via simple linear predictor terms. In particular, predictions are not as general as they should be for time-series models. However, I think we could implement this structure very similar to how ARMA works in brms, which would actually makecor_arr
somewhat useful again.