use of glm functions for multilevel models

wds15 commented 4 years ago

Using the new glm functions I am seeing nice speedups for a poisson regression. The epilepsy example goes from 3.8s to 2.1s or so.

The pattern to use would look like this (in a reduce_sum version):

  // compute partial sums of the log-likelihood
  real partial_log_lik(int[] seq, int start, int end, int[] Y, matrix Xc, vector b, real Intercept, int[] J_1, vector Z_1_1, vector r_1_1) {
    real ptarget = 0;
    int N = end - start + 1;
    // initialize linear predictor term
    //vector[N] mu = Intercept + Xc[start:end] * b;
    vector[N] z;
    for (n in 1:N) {
      // add more terms to the linear predictor
      int nn = n + start - 1;
      z[n] = r_1_1[J_1[nn]] * Z_1_1[nn];
    }
    ptarget += poisson_log_glm_lpmf(Y[start:end] | Xc[start:end], Intercept + z, b );
    return ptarget;
  }

It's definitely worthwhile to use these (and the other special glms).

Here is how I run the example:

fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
            data = epilepsy, family = poisson(),
            threads = 1,
            chains = 1, backend = "cmdstanr")

paul-buerkner commented 4 years ago

Interesting approach! So far, I have only used poisson_log_glm etc. for simple regression models without multilevel models or similar structure. Didn't know we can actually put all the stuff inside the Intercept!

When you run make_stancode(count ~ zAge + zBase * Trt, ...) you will see the GLM primitives being used by brms already.

Is there some data of how much we can put into the Intercept vector and still see reasonable speedups?

wds15 commented 4 years ago

No, don’t know how much, but this is the first model I tried. The intercept can be a scalar or a vector...so random effect models can be handled with the above pattern.

paul-buerkner commented 4 years ago

Interesting. So that would mean we can drastically expand the use of the GLM primitive functions in the hope for some speedups.

paul-buerkner commented 4 years ago

@jgabry did you consider this in rstanarm?

wds15 commented 4 years ago

This will also expand GPU ready Stan models.

BTW, do you have a good example model with many random effects? I had an idea I wanted to try out for a different optimization to cut down the AD tape size.

paul-buerkner commented 4 years ago

You could use the bike rent data, averaged version (https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset). A model that would be really complex is for example,

count ∼ windspeed + temp+ humidity + (windspeed + temp + humidity| month) + (windspeed + temp + humidity |weekday) + 
(windspeed + temp + humidity |weather) + (windspeed + temp + humidity |holiday) + 
(windspeed + temp + humidity |season)`

with a poisson or another count family. Thanks to @AlejandroCatalina for providing the model.

rok-cesnovar commented 4 years ago

Benchmarks for the GLMs https://github.com/stan-dev/math/pull/1185#issuecomment-477211435 (edit: sorry this is a benchmark of old glms vs new)

there were some additional improvements since then.

These are for glm vs no-glm for softmax regression https://github.com/stan-dev/math/pull/1242#issuecomment-492648823 And ordinal regression https://github.com/stan-dev/math/pull/1252#issuecomment-495237685

paul-buerkner commented 4 years ago

I substantially extended the use of GLM primitives to models with special terms (e.g., multilevel structure, splines, GPs, etc.). I didn't extensively test all the options, but from what I tested the speedups are quite nice. For example, the model

data("VerbAgg", package = "lme4")

form <- r2 ~ Anger + Gender + btype + situ + mode + mode:Gender +
  (0 + Gender | item) + (0 + mode | id)

fit <- brm(form, data = VerbAgg, family = bernoulli())

took on average 354 s with the old code not using the bernoulli_logit_glm primitive, while the code with the primitive took only 211 s on average. I think this is quite remarkable given that most of the computational complexity lies in the multilevel structure not in the overall coefficients for this model.

I would appreciate if some of you could run a few of your own benchmark models so that we can get a better understanding how much GLM primitives improve stuff beyond simple GLM cases. Of specific interest I think is to check for cases where it should not be adviced to use the GLM primitives (although they would be applicable to the model).

rok-cesnovar commented 4 years ago

Of specific interest I think is to check for cases where it should not be adviced to use the GLM primitives (although they would be applicable to the model)

I am not sure I completely understand this one. You mean a case where a GLM can be used but would hurt performance?

paul-buerkner commented 4 years ago

Yes. I mean the case where we could use a GLM primitive, but we don't really have many "fixed effects" and the special terms (passed via the alpha parameter of the primitive) dominate the computation so much that using the GLM primitive would actually hurt performance. I am not sure such a case exists and if there are theoretical reasons that this case is super unlikely to happen at all.

rok-cesnovar commented 4 years ago

Let me just tag the resident GLM expert @t4c1. If anyone, he would know.

t4c1 commented 4 years ago

"fixed effects" What do you mean by this? Anyway, I don't think there exists any use case where using GLM would be slower than x*beta and non-GLM distribution function.

Maybe only if the x*beta multiplication is not actually needed for the model. Even in that case using a GLM function might still be faster, as GLMs a re at the moment better optimized than other distributions.

paul-buerkner commented 4 years ago

Thanks! Yeah, with "fixed effects" I used the frequentist term of beta in X * beta of a GLM.

I will run some tests in cases where there is no x * beta actually but just stuff passed to alpha. What do I put into the x and beta arguments if I don't have any beta coefficients?

t4c1 commented 4 years ago

I guess you can make x a 0xN matrix and beta an empty vector.

t4c1 commented 4 years ago

As I mentioned before in this case GLM could also be slower. In any case I don't suggest using that long-term as other distributions will also get optimized soon (I guess in a month or two).

rok-cesnovar commented 4 years ago

So the only case GLMs could be slower (debatable whether they are now, but will most likely be in the very near future) is where there is no X*beta. For other cases you think a GLM will outperform a non-GLM also in the long term?

domiden commented 4 years ago

Yes. I mean the case where we could use a GLM primitive, but we don't really have many "fixed effects" and the special terms (passed via the alpha parameter of the primitive) dominate the computation so much that using the GLM primitive would actually hurt performance. I am not sure such a case exists and if there are theoretical reasons that this case is super unlikely to happen at all.

I'm not sure if that is some kind helpful to you, but did you consider a GWAS model including 1000s, 10000s or even 100000s of genetic markers as well as population structure as fixed effects and often include genomic relationship matrix as random effects. Animal and human related genome-wide association studies (GWAS) also include 10000s up to millions of individuals. Such model should be computationally very expensive.

Something like that?

# For example:
# data: a dataframe containing
#   - 1 ID column of n individuals (1)
#   - 10 response columns (2:11)
#   - m columns of SNP information (12:m+12)
#   - (c columns of additional covariates [principle components] representing e.g. populations structure)
# grm: a n x n covariance matrix representing (genetic) relationship
mvfy <- colnames(data)[2:11]
mvfx <- paste0(" ~ ", paste(colnames(data)[-c(1:11)], collapse = " + "))
mvfg <- " + (1 | p | gr(ID, cov = G))"
mvf <- paste0(mvfy, mvfx, mvfg)
mvgf <- brms::mvbf(flist = purrr::map(mvf, brms::brmsformula, family = gaussian()))
brms::brm(formula = mvzibf, data = data, data2 = list(G = grm), 
    chains = 4L, warmup = 5e3L, iter = 5e4L, cores = 4L)

paul-buerkner commented 4 years ago

This is now featured in the master branch after merging #1004

paul-buerkner / brms

use of glm functions for multilevel models #984