paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.29k stars 187 forks source link

Hierarchical Gaussian Process #412

Open cdaube opened 6 years ago

cdaube commented 6 years ago

Dear brms developer(s?),

first of all, thanks a lot for such an incredibly cool package!

I am fairly inexperienced with modeling and bayesian approaches in general. If this issue suffers too much from that, I am sorry!

That being said, I have the following data

The problem I have so far is that in grand averages, units and subunits cancel each other out quite easily due to offsets in timing or the exact position or frequency.

I am ultimately interested in a group-level "fixed effect" across the dimensions of the respective data and thought Gaussian Processes would be a really handy tool for that. The dream would be to use this in a hierarchical way similar to a classic lme4-style mixed model in order to summarise the data across all the units and subunits, i.e. obtain a "fixed effect population level gaussian process" which is disburdened from accounting for between-units variance.

I saw this issue: https://github.com/paul-buerkner/brms/issues/221 and understood that all that is possible at the moment is having separate GPs being estimated for different levels of a factor.

I also found this post: https://stats.stackexchange.com/questions/164825/hierarchical-multilevel-random-effects-gaussian-process-regression and thought that the OP seemed to think of a very similar if not the same thing as me. Does the reasoning of the OP there make sense? Could I thus build a brms model like so: y ~ gp(x1,x2, by = unit) + gp(x1,x2) to obtain a random gp and a fixed gp? Would this be able to account for variance across the units regarding the position in x1, x2? Would there then be a convenient way to account for subunits of the units?

Is what I am thinking of possible at all? If so, is this in fact already possible in brms (e.g. as outlined above)? Or is there a chance of it getting implemented?

Tausend Dank! Christoph

Another link that I found -- I hope this is not annoying: http://mc-stan.org/events/stancon2017-notebooks/stancon2017-trangucci-hierarchical-gps.pdf To be honest, I don't understand most of what is going on here, but it has so many buzzwords that ring with me. Perhaps it might be a useful resource for people who have more experience.

paul-buerkner commented 6 years ago

I think I get the idea. Basically, you want GPs with partial pooling. Just like a varying slope partially pooled over units but with a GP rather than a linear function. Do I summarize that correctly?

My knowledge about GPs is rather limited at this point, but maybe @avehtari knows more about this topic?

cdaube commented 6 years ago

Yes, I think the summary sounds quite like what I want :)

jgabry commented 6 years ago

Gaussian processes are very powerful but also pretty advanced (a lot of subtleties regarding priors and other aspects of the model). Since you say that you are relatively new to modeling and to Bayesian statistics I would start with simpler models and make sure you fully understand those models before working your way up to Gaussian processes. Even for experienced/advanced users, Gaussian process models are not easy. Anyway, I don’t mean to discourage you from looking into GPs (they’re great) but they’re a very challenging place to start if you’re relatively new to modeling and Bayesian statistics in general!

realkrantz commented 6 years ago

Hi Paul, thanks for implementing GP in brms. I have a similar situation as @cdaube. I want to use GPs to predict country-level prevalences of a given disease and use the trial site as a random variable. Your summary is exactly what I want. Is there any possibility of getting hierarchical GP regression implemented in brms? Is the idea suggested by @cdaube the way "to go"?

paul-buerkner commented 6 years ago

At some point maybe. But the thing is that GPs itself are hard and doing it hierachically is even harder. Right now, I am not even sure how such a hierachical GP should ideally look like (i.e. I have not seen a framework / implementation of it). There may be good approaches out there of which @avehtari might know of.

luccoffeng commented 6 years ago

For those who are interested, I worked out minimum working example of what could be considered a hierarchical GP regression for count data over 9 consecutive years in 10 communities, with random intercept, slope for log-linear pattern over time, and a random function (the GP) per village for the pattern over time (i.e. the deviation from the log-linear trend over time). There is only one set of GP parameters, i.e. all community-speciific random functions are drawn from the same GP.

I posted the model code along with example data and R script on stan discourse page

jlevy44 commented 3 years ago

Is there an update to this, re: hierarchical GP? Thanks!

jlevy44 commented 3 years ago

https://discourse.mc-stan.org/t/auto-grouping-option-in-brms-gp/11154

jlevy44 commented 3 years ago

I think this answers my question.