paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 184 forks source link

Implement additional response distributions #110

Closed paul-buerkner closed 6 years ago

paul-buerkner commented 8 years ago

With the implementation of distributional regression in the upcoming release of brms 1.0, I think it makes sense to think of additional response distributions (i.e., 'families') that could be worth implementing. Feel free to comment if you would like to see a special response distribution to be added to brms (see also the doc of the VGAM package https://cran.r-project.org/web/packages/VGAM/VGAM.pdf for many possible distributions).

aforren1 commented 8 years ago

What do you think about the von Mises distribution (for directional data)? A description starts on pg. 440 of the Stan 2.11.0 manual. The only catch (I think) is the recommendation to use

if (kappa < 100)
    y ~ von_mises(mu, kappa);
else
    y ~ normal(mu, sqrt(1 / kappa));

For when kappa grows large.

paul-buerkner commented 8 years ago

That sounds like a good idea. Do you have some sample data, where a von Mises distribution makes sense so that I can play around with it a bit? Also, since we need mu to be within a 2*pi interval, what kind of link function is typcially applied here?

aforren1 commented 8 years ago

Thanks for the quick response! It looks like you already found a proper link function. I ought to be able to cook up some real reach direction data on Monday, but in the meantime, here's simulated data from INLA. I fit an INLA model vs. brm (as of eba536363ba72b583877db0cc2f37189f670932b), and they're very similar.

# All credit to INLA, copy-pasted
ilink = function(x) 2*atan(x)
link = function(x) tan(x/2)

n = 300
z = rnorm(n, sd=0.3)
eta = 1 + z
y.pred = ilink(eta)

kappa = 5
x = seq(-pi, pi, len = 10000)
d = exp(kappa*cos(x))
dd = cumsum(d)
dd = dd /max(dd)
cn.icdf.func = splinefun(dd, x, method = "monoH.FC")
rcn = function(n) cn.icdf.func(runif(n))
y = y.pred + rcn(n)
dat <- data.frame(y, z)

m_inla <- inla(formula(y ~ 1 + z),  data = dat, 
                            family = "circularnormal", 
                            control.inla = list(cmin = -Inf))

Summary from INLA:


Call:
c("inla(formula = formula(y ~ 1 + z), family = \"circularnormal\", ",  "    data = dat, control.inla = list(cmin = -Inf))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.0750          0.1238          0.0140          0.2128 

Fixed effects:
              mean     sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 0.9953 0.0294     0.9392   0.9947      1.055 0.9936   0
z           1.0793 0.0880     0.9086   1.0785      1.254 1.0770   0

The model has no random effects

Model hyperparameters:
                                                          mean     sd 0.025quant 0.5quant 0.975quant  mode
Precision parameter for the Circular Normal observations 5.023 0.3517      4.372    5.012      5.734 4.988

Expected number of effective parameters(std dev): 2.027(0.002)
Number of equivalent replicates : 147.99 

Marginal log-Likelihood:  -215.43 

And from brms (call m_brms <- brm(y ~ 1 + z, data = dat, family = 'von_mises', chains = 2):

 Family: von_mises (tan_half) 
Formula: y ~ 1 + z 
   Data: dat (Number of observations: 300) 
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 2000
   WAIC: Not computed

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     1.00      0.03     0.94     1.05       1366    1
z             1.08      0.08     0.92     1.25       1290    1

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
kappa     5.04      0.37     4.37     5.79       1719    1
paul-buerkner commented 8 years ago

The von_mises family is now fully implemented and documented. @aforren1 I very much appreciate your help with testing it!

aforren1 commented 8 years ago

No problem, thank you for the implementation!

schmettow commented 8 years ago

Having additional distributions is an exciting perspective. What would be really useful for modelling response times were is exgaussian. I know it exists in Stan. Unfortunately, I have no clue about link functions and the like for this distribution.

paul-buerkner commented 8 years ago

Thanks for the suggestion! This sounds like a good idea.

paul-buerkner commented 8 years ago

@schmettow I have one question regarding the exgaussian distribution that you may have an opinion on. Currently, brms models the mean of the response distribution for most families (i.e. link(E(Y)) = eta) via the formula argument, while additionally being able to predict all auxiliary parameters with the upcoming release of brms 1.0. Predicting the mean of the exgaussian distribution would imply link(mu + 1/lambda) = eta. However, since the gaussian and exponential parts of the distribution are thought to represent different cognitive processes, it may make more sense to define link(mu) = eta. Of course, in both possible implementation, users could additionally predict the lambda and sigma parameters. Do you have any preferances regarding the parameterization?

schmettow commented 8 years ago

to my knowledge, the exgaussian is a more a pragmatic choice as it seems to fit reaction time data well. There doesn't seem to be a clear match between the two components and cognitive processes (unlike, for example, the shifted Wald, which is tied to the diffusion model in simple choice tasks). In my own research, I am perfectly fine with mu + lambda.

--Martin

waynefoltaERI commented 8 years ago

Generalized extreme value distribution (GEV) would include Frechet, Gumbel, and Weibull, I believe, so I'd rank that higher than Frechet alone. I've been playing with EV stuff in the last month or so, but haven't gotten to use it in practice yet, so can't really rank GEV on the whole.

paul-buerkner commented 7 years ago

For a discussion of the beta-binomial distribution see #144.

paul-buerkner commented 7 years ago

The exponentially modified gaussian distribution is now available in the dev version of brms via family exgaussian.

paul-buerkner commented 7 years ago

The Wiener diffusion model (family wiener) is now more or less implemented. Only the fitted method is not yet working as I have to find formulas for the mean reaction time at the upper and lower boundary depending on the model parameters.

paul-buerkner commented 7 years ago

After a discussion with @awellis on codewake (https://www.codewake.com/t/response-time-distributions) about the exgaussian family, I decided that the main model formula should only predict mu (the mean of the gaussian component) to allow for a complete separate prediction of mu and beta (the mean of the exponential component).

wfolta commented 7 years ago

(Actually, it appears that I could use Frechet currently, so if GEV is considerably more difficult, I'd vote for Frechet first.)

paul-buerkner commented 7 years ago

Frechet should be straight forward given that Stan has already a built-in implementation of it. I will try to get this in within the next few days.

paul-buerkner commented 7 years ago

@wfolta Actually do you have a paper in mind explaining frechet regression? I currently have no idea of the most reasonable parameterization.

wfolta commented 7 years ago

Oooo, I think we just got over my head. I'm, guessing that the simple calculation of the log density frechet of x = log (shape / scale) - (1 + shape) * log ((x - offset) / scale) - ((x - offset) / scale)^(-shape) isn't what you're asking for. Unfortunately, that's about all I know. If I find a paper I'll let you know.

paul-buerkner commented 7 years ago

Indeed. For most distributions, I try to predict the mean as the main parameter, but I am not sure whether this is a good (or common) option for the frechet distribution.

paul-buerkner commented 7 years ago

I have decided to model the mean of the frechet distribution for now as the main parameter to be predicted. You can it try out with the frechet family in the dev version of brms. I still need to add some tests and doc, but it should already work smoothly ;-)

Note that, in the Stan implementation of the frechet distribution, the offset (i.e. location) parameter is fixed to zero.

paul-buerkner commented 7 years ago

The frechet family is now fully implemented.

paul-buerkner commented 7 years ago

Generalized extreme value models (family gen_extreme_value) are now implemented. They are not yet fully stable and optimized but I hope to get this improved within the next few days.

MilaniC commented 7 years ago

Hi Paul. This is a really interesting new feature for GAMLSS with GEV likelihood. One issue so far I think is the default uniform prior on xi - it seems too vague and makes estimating this parameter very very difficult. This varying coefficient GAMLSS with GEV model gives similar results to gevlss() in mgcv:

m1<-brm(bf(COTSln ~ s(Year,by=NCS), sigma ~ s(Year), xi ~ 1), data=ltmp.data, family=gen_extreme_value(), ...)

with links:

mgcv: ("identity", "identity", "logit")

brms: ("identity", "log", "logit")

which are the same I think as mgcv is parameterised with log scale parameter and both brms and mgcv use the modified logit for the shape (restricted to -1 to 0.5)

but the shape parameter estimate from brm() is huge even after 10,000s iterations and hours to run (mgcv took a few seconds to run the exact same model) so I can never seem to get an estimate to assess whether weibull, gumbel or frechet

So any advice on a more appropriate prior for the shape parameter would be appreciated. Of course I am assuming that I have specified the GEVLSS structure in the brm() call correctly.

cheers

Milani

paul-buerkner commented 7 years ago

This is exactly why I ment with not fully stable or optimized :-)

I have yet to figure out a good prior on the shape (ideas welcome) and also vectorize the likelihood in Stan to make it faster.

paul-buerkner commented 7 years ago

@MilaniC You may improve sampling speed a bit by removng xi ~ 1 from the bf call. This way, xi will just be estimated as a single auxiliary parameter rather than a transformed intercept and we save quite a few calcuations.

The main problem I face is that the definition space of the responses depends on the parameters. When we sample parameters being invalid for some of the responses, divergent transitions will occur. As soon as I have sorted this out, things should get considerably faster.

MilaniC commented 7 years ago

Ok here is an idea for priors applicable in a Bayesian setting rather than the MLE inspired uniform (-1,0.5):

Try a diffuse proper prior such as Gaussian prior with large variance for the GEVLSS shape parameter (eg., normal(0,100)): see Coles S (2001). An Introduction to statistical modelling of extreme values. London: Springer

Very informative overview given in:

Northrop P, Attalides N (2016) Posterior propriety in Bayesian extreme value analyses using reference priors. Statistica Sinica 26 (2): 721-743

paul-buerkner commented 7 years ago

This won't help since a normal(0, 100) prior restrictred to [-1, 0.5] is basically uniform.

MilaniC commented 7 years ago

OK but why restrict to upper bound of 0.5? That idea of Smith 1985 was for a MLE setting. GEV regression model studies is financial risk assessment regularly use 2 as the upper bound

paul-buerkner commented 7 years ago

For values higher than 0.5, the variance of the distribution doesn't exist. We may try a higher bound as well, but it won't make things easier for the sampler.

MilaniC commented 7 years ago

Ok removing xi~1 from the bf call and estimating as an auxiliary parameter does result in dramatic speed improvement. But the estimated shape parameter (xi) now has no variance (a consequence of being an auxiliary parameter?):

Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat xi 0.5 0 0.5 0.5 13500 1

In fact it is also the hard upper bound of the prior. And the estimate suggests response from a Frechet type distribution. The shape estimate from mgcv::gevlss is not significantly different from zero (suggesting a Gumbel distribution). That has implications for estimating return period and probability. Yet on the other hand both brm(bf(…), family=gen_extreme_value()) and mgcv::gevlss result in very similar marginal smooth effects.

Looking forward to the vectorised version.

paul-buerkner commented 7 years ago

This is not a consequence of being an auxiliary parameter but rather results from the fact that, apparatently, xi wants to be estimated greater than 0.5. I will see what bounds actually make sense in the end. In any case, we might want to expand them.

I am not sure how mgcv estimates GEVs, but I assume the MLE does not do better in estimating xi than a fully bayesian model.

I wouldn't put too much hope in the vectorized version in your case, since, when predicting mu and sigma, there is not much left for vectorization to improve efficiency. The reason for this the current inefficiency is the complexity of estimating xi and the problem with the definition space mentioned above.

MilaniC commented 7 years ago

Ok thanks. yes it seems that estimating the shape parameter is very difficult in either a frequentist MLE or Bayesian framework and why xi~1 was all that was attempted here rather than make it a function of potentially informative predictors.

paul-buerkner commented 7 years ago

I have removed the upper bound on xi and added an informative normal(0, 0.3) prior. I don't think it will improve sampling speed, but at least you may be able to get a better estimate for xi. I will continue working on a more stable implementation, but we will likely not getting it much faster than it currently is, at least that's what my tests are suggesting.

MilaniC commented 7 years ago

OK. This works very well for the Fremantle sea-level maxima data from Coles (2001) in the ismev package. And for this data set runs very quickly (but with a lot of divergents). GEV shape parameter is well estimated and comparable to estimates from other packages:

from mgcv: xi_Intercept mean=-0.25 se=0.09 95% CI=(-0.43 to -0.08)

from VGAM: xi_Intercept mean=-0.27 se=0.06 95% CI=(-0.39 to -0.15)

from brms: xi_Intercept mean=-0.28 sd=0.13 95% uncertainty interval (-0.56 to -0.05)

And that is very encouraging. But the new shape prior + upper bound formulation results in uninterpretable results now for our much larger data set for coral reef block maxima response vars. Previous dev version with the modified logit gave same marginal smooths as mgcv but differed in the shape parameter estimate. Now the shape parameter is quite precisely estimated but the marginal smooths are nonsense.

I thought it might be due to the varying coefficient form of GAM we fitted but this is not the case as far simpler models have the same issue.

So in summary, I want to thank you for so valiantly trying to address this issue but sadly I think the previous formulation was more acceptable for the moment pending other ideas. So maybe best to revert to your original formulation with xi ~ uniform(-1, 0.5) rather than normal(0.0.3)

paul-buerkner commented 7 years ago

I don't want to judge this by one or two examples. I think both implementations are flawed as they naturally cause divergent transitation when xi leaves the region of definition.

I am currently working on an improved version, that correctly adresses the boundary issue. It might be slower but much more stable (speed doesn't matter when you cannot trust the results). I hope to have this fully implemented at the end of the weekend.

paul-buerkner commented 7 years ago

I think we are getting closer to a stable implementation of the GEV distribution. The shape parmaeter xi is now adaptively scaled so that the response values always fall within the definition space. This drastically reduces / eliminates divergent transitions. Sampling speed does not seem to be improved. It may even be decreased, but that is something we have to accept for the time being I guess.

paul-buerkner commented 7 years ago

See #209 for the initial issue about the zero-one-inflated beta distribution.

paul-buerkner commented 7 years ago

The skew normal distribution is now available via family skew_normal.

paul-buerkner commented 6 years ago

For extreme value analysis, the generalized Pareto distribution seems to be an interesting option. See http://mc-stan.org/users/documentation/case-studies/gpareto_functions.html for a case study by Aki, which already contains all the necessary custom Stan functions.

laboleaf commented 6 years ago

Hello! As said in the brms google group, I would be glad to be able to use another parametrization of the gamma distribution, with orthogonal mu and phi. I'm used to formulate it in that way :

  shape = mu .* mu ./ phi;
  rate = mu ./ phi;
  y ~ gamma(shape, rate); 

By the way, I would thank you very much for the work done and the great flexibility and efficiency of brms!

Lucas

jebyrnes commented 6 years ago

This is just a note - I am working with a student right now who is working with wind data (circular following a von mises distribution) and was worried about finding a way to implement this distribution in a glmm framework - and again, you have made both her and my day. Thank you!!!!

paul-buerkner commented 6 years ago

Due to the custom family feature some very specific families no longer need a built-in implementation. The other remaining distributions now have their own issues so I am closing this issue now.