Closed wds15 closed 4 years ago
Interesting approach! So far, I have only used poisson_log_glm
etc. for simple regression models without multilevel models or similar structure. Didn't know we can actually put all the stuff inside the Intercept!
When you run make_stancode(count ~ zAge + zBase * Trt, ...)
you will see the GLM primitives being used by brms already.
Is there some data of how much we can put into the Intercept vector and still see reasonable speedups?
No, don’t know how much, but this is the first model I tried. The intercept can be a scalar or a vector...so random effect models can be handled with the above pattern.
Interesting. So that would mean we can drastically expand the use of the GLM primitive functions in the hope for some speedups.
@jgabry did you consider this in rstanarm?
This will also expand GPU ready Stan models.
BTW, do you have a good example model with many random effects? I had an idea I wanted to try out for a different optimization to cut down the AD tape size.
You could use the bike rent data, averaged version (https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset). A model that would be really complex is for example,
count ∼ windspeed + temp+ humidity + (windspeed + temp + humidity| month) + (windspeed + temp + humidity |weekday) +
(windspeed + temp + humidity |weather) + (windspeed + temp + humidity |holiday) +
(windspeed + temp + humidity |season)`
with a poisson
or another count family. Thanks to @AlejandroCatalina for providing the model.
Benchmarks for the GLMs https://github.com/stan-dev/math/pull/1185#issuecomment-477211435 (edit: sorry this is a benchmark of old glms vs new)
there were some additional improvements since then.
These are for glm vs no-glm for softmax regression https://github.com/stan-dev/math/pull/1242#issuecomment-492648823 And ordinal regression https://github.com/stan-dev/math/pull/1252#issuecomment-495237685
I substantially extended the use of GLM primitives to models with special terms (e.g., multilevel structure, splines, GPs, etc.). I didn't extensively test all the options, but from what I tested the speedups are quite nice. For example, the model
data("VerbAgg", package = "lme4")
form <- r2 ~ Anger + Gender + btype + situ + mode + mode:Gender +
(0 + Gender | item) + (0 + mode | id)
fit <- brm(form, data = VerbAgg, family = bernoulli())
took on average 354 s with the old code not using the bernoulli_logit_glm primitive, while the code with the primitive took only 211 s on average. I think this is quite remarkable given that most of the computational complexity lies in the multilevel structure not in the overall coefficients for this model.
I would appreciate if some of you could run a few of your own benchmark models so that we can get a better understanding how much GLM primitives improve stuff beyond simple GLM cases. Of specific interest I think is to check for cases where it should not be adviced to use the GLM primitives (although they would be applicable to the model).
Of specific interest I think is to check for cases where it should not be adviced to use the GLM primitives (although they would be applicable to the model)
I am not sure I completely understand this one. You mean a case where a GLM can be used but would hurt performance?
Yes. I mean the case where we could use a GLM primitive, but we don't really have many "fixed effects" and the special terms (passed via the alpha
parameter of the primitive) dominate the computation so much that using the GLM primitive would actually hurt performance. I am not sure such a case exists and if there are theoretical reasons that this case is super unlikely to happen at all.
Let me just tag the resident GLM expert @t4c1. If anyone, he would know.
"fixed effects" What do you mean by this? Anyway, I don't think there exists any use case where using GLM would be slower than
x*beta
and non-GLM distribution function.
Maybe only if the x*beta
multiplication is not actually needed for the model. Even in that case using a GLM function might still be faster, as GLMs a re at the moment better optimized than other distributions.
Thanks! Yeah, with "fixed effects" I used the frequentist term of beta
in X * beta
of a GLM.
I will run some tests in cases where there is no x * beta
actually but just stuff passed to alpha
. What do I put into the x
and beta
arguments if I don't have any beta
coefficients?
I guess you can make x
a 0xN matrix and beta
an empty vector.
As I mentioned before in this case GLM could also be slower. In any case I don't suggest using that long-term as other distributions will also get optimized soon (I guess in a month or two).
So the only case GLMs could be slower (debatable whether they are now, but will most likely be in the very near future) is where there is no X*beta. For other cases you think a GLM will outperform a non-GLM also in the long term?
Yes. I mean the case where we could use a GLM primitive, but we don't really have many "fixed effects" and the special terms (passed via the
alpha
parameter of the primitive) dominate the computation so much that using the GLM primitive would actually hurt performance. I am not sure such a case exists and if there are theoretical reasons that this case is super unlikely to happen at all.
I'm not sure if that is some kind helpful to you, but did you consider a GWAS model including 1000s, 10000s or even 100000s of genetic markers as well as population structure as fixed effects and often include genomic relationship matrix as random effects. Animal and human related genome-wide association studies (GWAS) also include 10000s up to millions of individuals. Such model should be computationally very expensive.
Something like that?
# For example:
# data: a dataframe containing
# - 1 ID column of n individuals (1)
# - 10 response columns (2:11)
# - m columns of SNP information (12:m+12)
# - (c columns of additional covariates [principle components] representing e.g. populations structure)
# grm: a n x n covariance matrix representing (genetic) relationship
mvfy <- colnames(data)[2:11]
mvfx <- paste0(" ~ ", paste(colnames(data)[-c(1:11)], collapse = " + "))
mvfg <- " + (1 | p | gr(ID, cov = G))"
mvf <- paste0(mvfy, mvfx, mvfg)
mvgf <- brms::mvbf(flist = purrr::map(mvf, brms::brmsformula, family = gaussian()))
brms::brm(formula = mvzibf, data = data, data2 = list(G = grm),
chains = 4L, warmup = 5e3L, iter = 5e4L, cores = 4L)
This is now featured in the master branch after merging #1004
Using the new glm functions I am seeing nice speedups for a poisson regression. The epilepsy example goes from 3.8s to 2.1s or so.
The pattern to use would look like this (in a reduce_sum version):
It's definitely worthwhile to use these (and the other special glms).
Here is how I run the example: