Closed paul-buerkner closed 6 years ago
What do you think about the von Mises distribution (for directional data)? A description starts on pg. 440 of the Stan 2.11.0 manual. The only catch (I think) is the recommendation to use
if (kappa < 100)
y ~ von_mises(mu, kappa);
else
y ~ normal(mu, sqrt(1 / kappa));
For when kappa grows large.
That sounds like a good idea. Do you have some sample data, where a von Mises distribution makes sense so that I can play around with it a bit? Also, since we need mu
to be within a 2*pi interval, what kind of link function is typcially applied here?
Thanks for the quick response! It looks like you already found a proper link function. I ought to be able to cook up some real reach direction data on Monday, but in the meantime, here's simulated data from INLA. I fit an INLA model vs. brm
(as of eba536363ba72b583877db0cc2f37189f670932b), and they're very similar.
# All credit to INLA, copy-pasted
ilink = function(x) 2*atan(x)
link = function(x) tan(x/2)
n = 300
z = rnorm(n, sd=0.3)
eta = 1 + z
y.pred = ilink(eta)
kappa = 5
x = seq(-pi, pi, len = 10000)
d = exp(kappa*cos(x))
dd = cumsum(d)
dd = dd /max(dd)
cn.icdf.func = splinefun(dd, x, method = "monoH.FC")
rcn = function(n) cn.icdf.func(runif(n))
y = y.pred + rcn(n)
dat <- data.frame(y, z)
m_inla <- inla(formula(y ~ 1 + z), data = dat,
family = "circularnormal",
control.inla = list(cmin = -Inf))
Summary from INLA:
Call:
c("inla(formula = formula(y ~ 1 + z), family = \"circularnormal\", ", " data = dat, control.inla = list(cmin = -Inf))")
Time used:
Pre-processing Running inla Post-processing Total
0.0750 0.1238 0.0140 0.2128
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.9953 0.0294 0.9392 0.9947 1.055 0.9936 0
z 1.0793 0.0880 0.9086 1.0785 1.254 1.0770 0
The model has no random effects
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision parameter for the Circular Normal observations 5.023 0.3517 4.372 5.012 5.734 4.988
Expected number of effective parameters(std dev): 2.027(0.002)
Number of equivalent replicates : 147.99
Marginal log-Likelihood: -215.43
And from brms (call m_brms <- brm(y ~ 1 + z, data = dat, family = 'von_mises', chains = 2)
:
Family: von_mises (tan_half)
Formula: y ~ 1 + z
Data: dat (Number of observations: 300)
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 2000
WAIC: Not computed
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 1.00 0.03 0.94 1.05 1366 1
z 1.08 0.08 0.92 1.25 1290 1
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
kappa 5.04 0.37 4.37 5.79 1719 1
The von_mises
family is now fully implemented and documented. @aforren1 I very much appreciate your help with testing it!
No problem, thank you for the implementation!
Having additional distributions is an exciting perspective. What would be really useful for modelling response times were is exgaussian. I know it exists in Stan. Unfortunately, I have no clue about link functions and the like for this distribution.
Thanks for the suggestion! This sounds like a good idea.
@schmettow I have one question regarding the exgaussian distribution that you may have an opinion on. Currently, brms models the mean of the response distribution for most families (i.e. link(E(Y)) = eta
) via the formula
argument, while additionally being able to predict all auxiliary parameters with the upcoming release of brms 1.0. Predicting the mean of the exgaussian distribution would imply link(mu + 1/lambda) = eta
. However, since the gaussian and exponential parts of the distribution are thought to represent different cognitive processes, it may make more sense to define link(mu) = eta
. Of course, in both possible implementation, users could additionally predict the lambda
and sigma
parameters. Do you have any preferances regarding the parameterization?
to my knowledge, the exgaussian is a more a pragmatic choice as it seems to fit reaction time data well. There doesn't seem to be a clear match between the two components and cognitive processes (unlike, for example, the shifted Wald, which is tied to the diffusion model in simple choice tasks). In my own research, I am perfectly fine with mu + lambda.
--Martin
Generalized extreme value distribution (GEV) would include Frechet, Gumbel, and Weibull, I believe, so I'd rank that higher than Frechet alone. I've been playing with EV stuff in the last month or so, but haven't gotten to use it in practice yet, so can't really rank GEV on the whole.
For a discussion of the beta-binomial distribution see #144.
The exponentially modified gaussian distribution is now available in the dev version of brms via family exgaussian
.
The Wiener diffusion model (family wiener
) is now more or less implemented. Only the fitted
method is not yet working as I have to find formulas for the mean reaction time at the upper and lower boundary depending on the model parameters.
After a discussion with @awellis on codewake (https://www.codewake.com/t/response-time-distributions) about the exgaussian
family, I decided that the main model formula should only predict mu
(the mean of the gaussian component) to allow for a complete separate prediction of mu
and beta
(the mean of the exponential component).
(Actually, it appears that I could use Frechet currently, so if GEV is considerably more difficult, I'd vote for Frechet first.)
Frechet should be straight forward given that Stan has already a built-in implementation of it. I will try to get this in within the next few days.
@wfolta Actually do you have a paper in mind explaining frechet regression? I currently have no idea of the most reasonable parameterization.
Oooo, I think we just got over my head. I'm, guessing that the simple calculation of the log density frechet of x = log (shape / scale) - (1 + shape) * log ((x - offset) / scale) - ((x - offset) / scale)^(-shape) isn't what you're asking for. Unfortunately, that's about all I know. If I find a paper I'll let you know.
Indeed. For most distributions, I try to predict the mean as the main parameter, but I am not sure whether this is a good (or common) option for the frechet distribution.
I have decided to model the mean of the frechet distribution for now as the main parameter to be predicted. You can it try out with the frechet
family in the dev version of brms. I still need to add some tests and doc, but it should already work smoothly ;-)
Note that, in the Stan implementation of the frechet distribution, the offset (i.e. location) parameter is fixed to zero.
The frechet
family is now fully implemented.
Generalized extreme value models (family gen_extreme_value
) are now implemented. They are not yet fully stable and optimized but I hope to get this improved within the next few days.
Hi Paul. This is a really interesting new feature for GAMLSS with GEV likelihood. One issue so far I think is the default uniform prior on xi - it seems too vague and makes estimating this parameter very very difficult. This varying coefficient GAMLSS with GEV model gives similar results to gevlss() in mgcv:
m1<-brm(bf(COTSln ~ s(Year,by=NCS), sigma ~ s(Year), xi ~ 1), data=ltmp.data, family=gen_extreme_value(), ...)
with links:
which are the same I think as mgcv is parameterised with log scale parameter and both brms and mgcv use the modified logit for the shape (restricted to -1 to 0.5)
but the shape parameter estimate from brm() is huge even after 10,000s iterations and hours to run (mgcv took a few seconds to run the exact same model) so I can never seem to get an estimate to assess whether weibull, gumbel or frechet
So any advice on a more appropriate prior for the shape parameter would be appreciated. Of course I am assuming that I have specified the GEVLSS structure in the brm() call correctly.
cheers
Milani
This is exactly why I ment with not fully stable or optimized :-)
I have yet to figure out a good prior on the shape (ideas welcome) and also vectorize the likelihood in Stan to make it faster.
@MilaniC You may improve sampling speed a bit by removng xi ~ 1
from the bf
call. This way, xi
will just be estimated as a single auxiliary parameter rather than a transformed intercept and we save quite a few calcuations.
The main problem I face is that the definition space of the responses depends on the parameters. When we sample parameters being invalid for some of the responses, divergent transitions will occur. As soon as I have sorted this out, things should get considerably faster.
Ok here is an idea for priors applicable in a Bayesian setting rather than the MLE inspired uniform (-1,0.5):
Try a diffuse proper prior such as Gaussian prior with large variance for the GEVLSS shape parameter (eg., normal(0,100)): see Coles S (2001). An Introduction to statistical modelling of extreme values. London: Springer
Very informative overview given in:
Northrop P, Attalides N (2016) Posterior propriety in Bayesian extreme value analyses using reference priors. Statistica Sinica 26 (2): 721-743
This won't help since a normal(0, 100) prior restrictred to [-1, 0.5] is basically uniform.
OK but why restrict to upper bound of 0.5? That idea of Smith 1985 was for a MLE setting. GEV regression model studies is financial risk assessment regularly use 2 as the upper bound
For values higher than 0.5, the variance of the distribution doesn't exist. We may try a higher bound as well, but it won't make things easier for the sampler.
Ok removing xi~1 from the bf call and estimating as an auxiliary parameter does result in dramatic speed improvement. But the estimated shape parameter (xi) now has no variance (a consequence of being an auxiliary parameter?):
Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat xi 0.5 0 0.5 0.5 13500 1
In fact it is also the hard upper bound of the prior. And the estimate suggests response from a Frechet type distribution. The shape estimate from mgcv::gevlss is not significantly different from zero (suggesting a Gumbel distribution). That has implications for estimating return period and probability. Yet on the other hand both brm(bf(…), family=gen_extreme_value()) and mgcv::gevlss result in very similar marginal smooth effects.
Looking forward to the vectorised version.
This is not a consequence of being an auxiliary parameter but rather results from the fact that, apparatently, xi wants to be estimated greater than 0.5. I will see what bounds actually make sense in the end. In any case, we might want to expand them.
I am not sure how mgcv estimates GEVs, but I assume the MLE does not do better in estimating xi than a fully bayesian model.
I wouldn't put too much hope in the vectorized version in your case, since, when predicting mu and sigma, there is not much left for vectorization to improve efficiency. The reason for this the current inefficiency is the complexity of estimating xi and the problem with the definition space mentioned above.
Ok thanks. yes it seems that estimating the shape parameter is very difficult in either a frequentist MLE or Bayesian framework and why xi~1 was all that was attempted here rather than make it a function of potentially informative predictors.
I have removed the upper bound on xi
and added an informative normal(0, 0.3)
prior. I don't think it will improve sampling speed, but at least you may be able to get a better estimate for xi
. I will continue working on a more stable implementation, but we will likely not getting it much faster than it currently is, at least that's what my tests are suggesting.
OK. This works very well for the Fremantle sea-level maxima data from Coles (2001) in the ismev package. And for this data set runs very quickly (but with a lot of divergents). GEV shape parameter is well estimated and comparable to estimates from other packages:
And that is very encouraging. But the new shape prior + upper bound formulation results in uninterpretable results now for our much larger data set for coral reef block maxima response vars. Previous dev version with the modified logit gave same marginal smooths as mgcv but differed in the shape parameter estimate. Now the shape parameter is quite precisely estimated but the marginal smooths are nonsense.
I thought it might be due to the varying coefficient form of GAM we fitted but this is not the case as far simpler models have the same issue.
So in summary, I want to thank you for so valiantly trying to address this issue but sadly I think the previous formulation was more acceptable for the moment pending other ideas. So maybe best to revert to your original formulation with xi ~ uniform(-1, 0.5) rather than normal(0.0.3)
I don't want to judge this by one or two examples. I think both implementations are flawed as they naturally cause divergent transitation when xi
leaves the region of definition.
I am currently working on an improved version, that correctly adresses the boundary issue. It might be slower but much more stable (speed doesn't matter when you cannot trust the results). I hope to have this fully implemented at the end of the weekend.
I think we are getting closer to a stable implementation of the GEV distribution. The shape parmaeter xi
is now adaptively scaled so that the response values always fall within the definition space. This drastically reduces / eliminates divergent transitions. Sampling speed does not seem to be improved. It may even be decreased, but that is something we have to accept for the time being I guess.
See #209 for the initial issue about the zero-one-inflated beta distribution.
The skew normal distribution is now available via family skew_normal
.
For extreme value analysis, the generalized Pareto distribution seems to be an interesting option. See http://mc-stan.org/users/documentation/case-studies/gpareto_functions.html for a case study by Aki, which already contains all the necessary custom Stan functions.
Hello! As said in the brms google group, I would be glad to be able to use another parametrization of the gamma distribution, with orthogonal mu and phi. I'm used to formulate it in that way :
shape = mu .* mu ./ phi;
rate = mu ./ phi;
y ~ gamma(shape, rate);
By the way, I would thank you very much for the work done and the great flexibility and efficiency of brms!
Lucas
This is just a note - I am working with a student right now who is working with wind data (circular following a von mises distribution) and was worried about finding a way to implement this distribution in a glmm framework - and again, you have made both her and my day. Thank you!!!!
Due to the custom family feature some very specific families no longer need a built-in implementation. The other remaining distributions now have their own issues so I am closing this issue now.
With the implementation of distributional regression in the upcoming release of brms 1.0, I think it makes sense to think of additional response distributions (i.e., 'families') that could be worth implementing. Feel free to comment if you would like to see a special response distribution to be added to brms (see also the doc of the VGAM package https://cran.r-project.org/web/packages/VGAM/VGAM.pdf for many possible distributions).