paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.28k stars 184 forks source link

Specify the RE distribution? #231

Closed waynefoltaERI closed 8 months ago

waynefoltaERI commented 7 years ago

I may be mistaken, but I believe that almost all Mixed-/Random-effects packages allow a wide variety of families/links, but all pretty much model random effects as gaussian (0, SD), calculating the SD.

It's early in the morning and I have to admit that I'm suddenly uncertain about the relationship between families/links, prior distributions, and the distribution of random effects. And perhaps this has been asked before, but...

Does it make sense to allow us to specify non-normal distributions for random effects? If so, does it make sense to allow most any distribution, or only symmetric distributions (student comes to mind, including df=0)?

paul-buerkner commented 7 years ago

In principle, yes, we may want to amend the distribution of the random effects. However, we are sort of limited to those distribution supporting something like a correlation matrix or otherwise, we cannot support correlated random effects. The student-t distribution is certainly a candiate.

It has to be noted though, the the normal distribution can be understood just as a prior of the random effects facilitating partial pooling (which is what we want to achieve via multilevel modelling), without necessarily relying on the assumption that the random effects are truely normally distributed.

waynefoltaERI commented 7 years ago

I can accept things either way, as I'm not sure of the potential benefits or when I'd even want to use it. Just wanted to raise the issue.

If there is no benefit, it's not worth adding since we already have families/links and also prior distributions, so adding yet another set of distributions without a clear benefit could be too much. If there is a benefit, perhaps it could be implemented as something like a student distribution with a brm parameter re.df that has a range of 0 to 30, where 30 switches to a normal, and less than 30 is a student with that many degrees of freedom.

Again, there may be no true benefit to this. I just remember reading somewhere that no (or few) packages implement anything different. Even if there is a benefit, we might also need to think about whether a global RE distribution is useful or if it would need to be per RE . (The latter case would, of course, add more complication and might require formula syntax, which probably makes it undesirable.)

So this is mainly a trial balloon to let people comment if there is a potential benefit. In most packages, it wouldn't be possible, even if a benefit is perceived, but since brms is based on Stan it seems like it could be done.

(P.S. I'm using a mixed-effects model to predict the number of medical procedure types performed by various kinds of doctors. I created a nice scatterplot matrix for each kind of doctor and you could see the different slopes/intercepts for different specialties and it was impressive. Except one specialty where we only had three doctors, which pulled the slope much farther from the population than I would've expected considering the population is 30,000 or so. There's still a LOT I don't understand about hierarchical models!)

paul-buerkner commented 7 years ago

From a syntax perspective, this feature is easy to implement. When someone writes, say, (1 + x | person), this is implicitely translated to (1 + x | gr(person)), where the gr function makes sure the grouping variable is correctly specified. It would be easy to add an argument to gr named, say, dist to specifiy the distribution of the random effects: (1 + x | gr(person, dist = "student"))

So the main question is whether this is useful. Any comments are welcome. @jgabry what is your opinion on this?

mvuorre commented 7 years ago

Some applied researchers in Psychology are intimately interested in between-person heterogeneity in effects (see, for example, https://brucedore.github.io/Dore&Bolger_SPPS_accepted.pdf), as quantified by (x | person). I can imagine they would appreciate more choices for these 'heterogeneity' distributions.

paul-buerkner commented 7 years ago

What bothers me is whether other (and which) distributions are reasonable and what recommendations I can give to users in this regard.

jgabry commented 7 years ago

On Thu, Aug 17, 2017 at 7:36 PM Paul-Christian Bürkner < notifications@github.com> wrote:

What bothers me is whether other (and which) distributions are reasonable and what recommendations I can give to users in this regard.

This bothers me too. It gets insanely complicated very quickly. I think there are very few about which we know enough to make generally applicable recommendations.

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/231#issuecomment-323221977, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q-FPNhGP-nsyy3dDMgtn7cSXrZwyks5sZM52gaJpZM4OH6yX .

paul-buerkner commented 6 years ago

@jgabry Following up on your last comment: Is there currently any distribution other than the normal distribution that the Stan team would recommend using as a hyperprior in multilevel models?

mvuorre commented 6 years ago

For what it is worth, BDA section 17.4 discusses fitting the eight schools model with a t-distribution on the schools' effects instead of a normal, in the context of a sensitivity analysis. They don't find that inference for that example greatly changes between normal and t--but perhaps it would be a good idea to let users (easily) examine this for their models?

paul-buerkner commented 6 years ago

Yes, this would indeed be an option. My concern is that with a student-t distribution we effectively remove most of the shrinkage from the estimates (because the tails of the t-distribution can be large for small degrees of freedom). And, since shrinkage is a good thing, we would end up reducing the benefit of multilevel model in such cases.

jgabry commented 6 years ago

I’ve used a t distribution as a prior on varying intercepts before (not often because of what you mentioned but can be useful) but if you have varying intercepts and varying slopes and want to model the correlations then you’d need a multivariate distribution and even moving from multivariate normal to multivariate t can get pretty complicated and beyond the t it’s even harder. I’m not sure it’s worth going down that road outside of rare circumstances. I think you’d also run into more sampling problems than you get with the normal.

So I don’t have any great suggestions on this topic unfortunately.

On Thu, Nov 30, 2017 at 10:42 AM Paul-Christian Bürkner < notifications@github.com> wrote:

Yes, this would indeed be an option. My concern is that with a student-t distribution we effectively remove most of the shrinkage from the estimates (because the tails of the t-distribution can be large for small degrees of freedom). And, since shrinkage is a good thing, we would end up reducing the benefit of multilevel model in such cases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/231#issuecomment-348226746, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q4pKEcNLHqmptt1Tz-G1sA6kFFbGks5s7szxgaJpZM4OH6yX .

paul-buerkner commented 6 years ago

Thanks that's very helpful, Jonah. I will play around a bit with the t-distribution and see how well this works out.

hansvancalster commented 6 years ago

Hi Paul,

Molenberghs, Geert; Verbeke, Geert; Demétrio, Clarice G. B.; Vieira, Afrânio M. C. A Family of Generalized Linear Models for Repeated Measures with Normal and Conjugate Random Effects. Statist. Sci. 25 (2010), no. 3, 325--347. doi:10.1214/10-STS328. https://projecteuclid.org/euclid.ss/1294167963

In this article the authors make a case to use conjugate distributions to model random effects that account for overdispersion in combination with normal random effects to account for correlation. Maybe this is a case where specifying the random effects distribution is useful? However, I am not sure if it should be implemented because I think it can be handled more directly by implementing the distributions that result from integrating out the conjugate random effects (i.e. binomial + beta distributed individual level random effect = beta-binomial distribution; poisson + gamma-distributed individual level random effect = negative-binomial distribution).

Thank you for all your efforts! brms is a great package!

Hans

2017-11-30 22:55 GMT+01:00 Paul-Christian Bürkner notifications@github.com :

Thanks that's very helpful, Jonah. I will play around a bit with the t-distribution and see how well this works out.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/231#issuecomment-348333564, or mute the thread https://github.com/notifications/unsubscribe-auth/AR2Mdt_VWo862-RrqmYz92BTv2ZPb-_Jks5s7yRigaJpZM4OH6yX .

--

Hans Van Calster Statistisch expert

Vlaamse Overheid INSTITUUT VOOR NATUUR- EN BOSONDERZOEK Team Biometrie, Methodologie en Kwaliteitszorg M +32 499 135 318 hans.vancalster@inbo.be Kliniekstraat 25, B-1070 Brussel www.inbo.be

https://overheid.vlaanderen.be/mobiliteitsplan-herman-teirlinckgebouw

Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. ///////////////////////////////////////////////////////////////////////////////////////////

paul-buerkner commented 6 years ago

Thanks! From what I have seen (and as you say) these conjugate random effects models are usually chosen, because they allow to marginalize over the random effects and provide a model, which is mathematically easier to handle. I will look at the paper though and see if they provide other arguments for these distributions.

From a practical perspective, they don't really fit into the brms framework, since these distributions are not defined on the link scale (log or logit, say) but on the original one. That is, such random effects would have to be specified outside of the inverse-link function (in case of non-identity links) and this will require quite a bit of special case coding.

bachlaw commented 6 years ago

McCulloch & Neuhaus wrote a very good article a few years ago claiming that "mis-specifying" random effects models with a normal distribution was essentially a non-issue, and contending that arguments to the contrary were mostly relying upon contrived or unrealistic situations. https://arxiv.org/pdf/1201.1980.pdf.

My takeaway is that entropy ends up being more important and so, as you suspect, this is fortunately a non-issue the vast majority of the time.

paul-buerkner commented 6 years ago

Thanks! That's very helpful!

paul-buerkner commented 6 years ago

Student-t distributed varying effects are now supported by brms for testing purposes. They are exposed to users but not documented and thus not (yet) officially supported. They can be specified by setting argument dist of the gr() function. For instance

y ~ x + (x | gr(g, dist = "student"))
bachlaw commented 5 years ago

I appreciate having the option to specify a different distribution for the random effects. In some of my datasets, however, it would also be useful to accommodate meaningful outliers with a lognormal or a gamma distribution to the random effects. Since the addition of the student_t option appears to be a success, perhaps we could add one or two more options to accommodate more challenging data sets? Many thanks again for this wonderful effort.

paul-buerkner commented 5 years ago

Lognormal is already possible by modeling a random effect on the log-scale. This is automatically the case if the family uses the log-link. If not, you may use brms' non-linear framework. As it currently stands, I won't implement any random effects distributions that don't support correlated random effects, such as Gamma.

swarupswaminathan commented 4 years ago

Hi - I had two questions for the group:

1) Regarding the gr function - is there any way to allow for nesting of variables for the REs? The typical lme4 syntax does not appear to work (i.e., x | g/r) when used with the gr function. I am trying to use student t distributed varying effects.

2) Is there any chance you might consider adding a log-gamma distribution as a potential distribution to use with the gr function? Would greatly appreciate its addition.

Thank you!

paul-buerkner commented 4 years ago
  1. (x | gr(g)) + (x | gr(g:r)) should work.
  2. Thanks for the suggestion. I will see if it makes sense to implement the log-gamma distribution for this purpose.
swarupswaminathan commented 4 years ago

I have used the make_stancode call to generate two Stan files (attached in zip file). One is with a Gaussian distribution for the random effects, and the second is using the gr() function to obtain Student-t distributed varying effects (as noted above by Paul). I have added the model specification at the top of the files for reference.

Could someone highlight where exactly in the Stan code the t-distribution is being specified on the random effects? I see a definition of a df variable and the priors for this variable, but it is not clear where exactly the change is specified.

Thank you.

Stan_NDvsT.zip

paul-buerkner commented 4 years ago

Please ask brms related questions on https://discourse.mc-stan.org/

swarupswaminathan commented 4 years ago

https://discourse.mc-stan.org/t/brms-multivariate-response-different-distributions/4939/9

wds15 commented 2 years ago

Not sure if the Student-t is still being considered to be included as a fully supported random effects distribution, but what may help is the recent availability of a multi_student_t _cholesky in Stan 2.30:

https://mc-stan.org/docs/functions-reference/multi-student-t-cholesky-fun.html

paul-buerkner commented 2 years ago

It is already implemented just not document and officially supported. Since I am using a non-centered parameterization for the random effects also in the student-t random effects case, I don't really care for this feature.

However, I care for the multivariate models with correlated student-t likelihoods. There the cholesky version will be really useful.

samuelfleischer commented 1 year ago

Hi @paul-buerkner, are there plans to implement skew normal and multivariate skew normal random effects for brms? Thank you!

paul-buerkner commented 1 year ago

Not in the new future but perhaps at a later point.

venpopov commented 1 year ago

Found this thread while looking if I can specify a gamma distribution for the random effects. I know above you said you have no plans to do that, but it would be great - we have figured out how to estimate several mixture models of visual working memory using brms. Just realized that one of the trickiest, the variable precision model can be estimated just by specifying a random effect over trials. Unfortunately, it assumes a gamma distribution over trials for the parameter of the von mises distribution. Would be great if that was an option.

StuDonovan commented 8 months ago

Hi @paul-buerkner -- thanks for adding this gr() functionality. Fwiw I'm currently using it to estimate two variants of a simple linear (meta-analysis) model (the only difference shown in bold):

Model variant 1: bf(effect | se(se, sigma = TRUE) ~ 0 + Intercept + (1 | gr(study, dist = "student")), sigma ~ 1 + (1 | study)) + student()

Model variant 2: bf(effect | se(se, sigma = TRUE) ~ 0 + Intercept + (1 | gr(study, dist = "student")), sigma ~ 1 + (1 | gr(study, dist = "student"))) + student()

When I estimate this model, I find: 1) _getprior() / _validateprior() produces the same output for both variants (below) 2) summary(model) doesn't report a df parameter for the varying effects in the sigma level of the second model variant 3) the results indicate that the model variants do actually differ from each other

Could you help me understand what might be going on here?

prior class coef group resp dpar nlpar lb ub source normal(0, 1) b user normal(0, 1) b Intercept (vectorized) gamma(2, 0.1) df study 1 default student_t(3, 0, 2.5) Intercept sigma default gamma(2, 0.1) nu 1 default student_t(3, 0, 2.5) sd 0 default student_t(3, 0, 2.5) sd sigma 0 default student_t(3, 0, 2.5) sd study 0 (vectorized) student_t(3, 0, 2.5) sd Intercept study 0 (vectorized) student_t(3, 0, 2.5) sd study sigma 0 (vectorized) student_t(3, 0, 2.5) sd Intercept study sigma 0 (vectorized)

paul-buerkner commented 8 months ago

Please ask brms related questions on https://discourse.mc-stan.org/