Hierarchical prior for the regression coefficients

jpiironen commented 6 years ago

This is a feature suggestion based on my question on discourse (http://discourse.mc-stan.org/t/hierarchical-prior-for-the-regression-coefficients/4582).

I would like to be able to place a prior on the regression coefficients so that the prior has one or more hyperparameters that will also be given priors and will be inferred. I'm thinking something like allowing user to specify new parameters via function stanvar instead of only declearing variables that will be passed in as data (you will of course know better what is the best way to implement this).

If it is difficult to implement this in such a generic way as I described above (meaning that virtually any parameters and priors could be formulated), I think it would already be useful to have the normal prior N(0,tau) for which user could specify a hyperprior for tau.

paul-buerkner commented 6 years ago

Thanks for opening this issue! I think allowing to define parameters via stanvar is a very sensible idea. Let me see what needs to be done to get this implemented.

paul-buerkner commented 6 years ago

Upon thinking of this a little bit more, I think we should add a new function similar to stanvar() which outputs the same class (stanvars) but has a different interface. Right now, the first argument of stanvar requires the R object to be passed as data, which we don't need for the definition in any other model block. Thus, using stanvar() for other model blocks would possibly cause a lot of confusion for users about what should be specified in which argument.

We could name it, say, stanpar() (subject to discussion) and let it have the arguments name, scode and block. I would say that scode could internally default to "real <name>;" and block could default to "parameters" with further options "tparameters", "tdata", "model", and "genquant".

What do you think of this?

jpiironen commented 6 years ago

I don't have strong opinion about this, I believe you have a better view on this. I would be fine with a new function, say stanpar, but here's another idea.

Define stanvar simply so that if the first argument x is NULL, then it indicates that this new variable is unknown (=parameter) and hence in this case the block-argument (that should be added to stanvar) would be by default 'parameters'. Or alternatively the function could throw an informative error message if x=NULL but block='data'. I think this same logic could be used for all the other quantities as well (so that x=NULL whenever block is something else than 'data').

I haven't thought this through but perhaps you can comment whether this makes sense to you.

paul-buerkner commented 6 years ago

I agree that defaulting x = NULL and using stanvar throughout is perhaps the better idea. I will play around with it and come up with an implememtation to be tested soonish.

jpiironen commented 6 years ago

Great. I'm happy to try it out if you have it in a feature branch at some point.

paul-buerkner commented 6 years ago

In dev version of brms on github it should now be working. For example:

# define a hierachical prior on the regression coefficients
bprior <- set_prior("normal(0, tau)", class = "b") +
  set_prior("target += normal_lpdf(tau | 0, 10)", check = FALSE)
stanvars <- stanvar(scode = "  real<lower=0> tau;", block = "parameters")
make_stancode(count ~ Trt + log_Base4_c, epilepsy,
              prior = bprior, stanvars = stanvars)

ASKurz commented 6 years ago

I love this idea. Would it also extend to things like residual correlations?

paul-buerkner commented 6 years ago

Can you explain that a little bit more?

ASKurz commented 6 years ago

To my knowledge, the only way to get Pearson’s-like correlations in brms is by fitting an intercepts-only multivariate model. The residual correlations come out like simple base R cor() bivariate correlations, but with an entire posterior distribution. I’m wondering if it’d be possible to use a hierarchical prior like this to pool them.

And of course, if there’s a better way to get correlations in brms than the way I described, I’d love to hear what that is, too.

paul-buerkner commented 6 years ago

You could standardized both response and single predictor in a linear model model to get the correlation as well. What kind of correlations do you want to pool this way? What is the use case?

ASKurz commented 6 years ago

Ah, of course. That’s another good way to do it.

I’m just thinking in terms of how a lot of cross-sectional studies in psychology (perhaps other fields) make a big deal over the correlation matrix. I know we can use the lkj to pull them toward zero. But I’m also wondering if it makes sense to partially-pool to a grand mean instead.

E.g., consider a psychometric study on a new questionnaire. After someone does some kind of factor analysis to show the fit isn’t too embarrassing, there typically follows a correlation matrix of the sum score of focal questionnaire with a handful of sum scores from other questionnaires of theoretical interest. Folks occasionally make a big deal about how one correlation is .3 where another is .42 etc. Perhaps it’s be better to partially-pool to a grand mean.

paul-buerkner commented 6 years ago

That would perhaps come down to something like a "meta-analysis" of correlations. See the blog post of Matti Vuorre how to achieve that in brms: https://vuorre.netlify.com/post/2016/2016-09-29-bayesian-meta-analysis/

jpiironen commented 6 years ago

Tried this now, and it seems to work like a charm. Thanks a lot!

paul-buerkner commented 6 years ago

Glad to hear that! Closing this now :-)

jpiironen commented 6 years ago

One more thing. In principle I can now do what I wanted but I realized that this is not necessarily the best way to code this prior in Stan. Indeed I get quite a few divergent transitions in sampling and tuning adapt_delta does not always help. I would like to try this alternative parametrization which I believe would be better:

parameters {
    real<lower=0> tau;
    vector[D] b_raw;
}
transformed parameters {
    vector[D] b = b_raw*tau;
}
model {
    tau ~ cauchy(0,1);
    b_raw ~ normal(0,1);
}

However, as far as I could figure out, there is no way to code it like this, since I cannot move the actual regression coefficients b from parameters to transformed parameters. Do you have any workaround for this?

paul-buerkner commented 6 years ago

Indeed, that's an issue we currently have no solution for in brms. The horseshoe prior uses this type of parameterization but has a rather huge code overhead because of this. So in principle, we can reuse that code to come up with a general solution but then need an nice interface that is both general enough, but at the same time restrictive to ensure the brms model to be working. That is, b needs to be computed at some point.

jpiironen commented 6 years ago

I see. I don't know what would be the best way but here's one idea. I could solve this specific case only if I was able to redefine b so that it belongs to transformed parameters and not to parameters (because it is already possible to define tau and b_raw and set priors for them) using command like stanvar(scode = 'vector[K] b = b_raw*tau;', block='tparameters').

Could it be possible to modify the code so that if user defines a variable (such as b) using stanvar, then the program would ignore the declaration and the priors of this variable by default? Or does this lead to some obvious problems?

paul-buerkner commented 6 years ago

This would indeed be the obvious solution. Not sure how widely we can (should?) make this possible. Perhaps b is the only real application of it, though definitly worth it I guess. Would you mind opening a new issue for this feature?

jpiironen commented 6 years ago

Sure, no problem. I opened a new issue for this.

paul-buerkner / brms

Hierarchical prior for the regression coefficients #459