paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.29k stars 187 forks source link

Use vectorized beta_binomial_lpmf #1703

Closed avehtari closed 3 weeks ago

avehtari commented 3 weeks ago

There can be a huge difference in sampling speed with non-vectorized and vectorized code.

Simulated data

x <- rnorm(n=4000);
y <- rbinom(n=4000, size=40, prob=plogis(x));
simdata <- data.frame(x,y);

Short run

sfit1 <- brm(y | trials(40) ~ x, family=beta_binomial(), data=simdata, refresh=10, chain=1, iter = 100)

takes 812s in my laptop.

Using make_stancode() and changing

    for (n in 1:N) {
      target += beta_binomial_lupmf(Y[n] | trials, mu[n] * phi, (1 - mu[n]) * phi);
    }

to

    target += beta_binomial_lupmf(Y | trials, mu * phi, (1 - mu) * phi);

and then the sampling takes 2s. This is 400x speedup.

I have a bit more complex model where the sampling for the vectorized version takes 90s, and based on a over-night run which did not yet finish, the expected sampling time with default vectorized version is 5 days. In this case, there is 5000x speedup.

If the vectorized form doesn't work for everything, it would be great to have an option to switch to the vectorized form with an explicit argument, which would make it much easier to get the speed-up without manual editing of the code and figuring out how to get the results back to brms object.

paul-buerkner commented 3 weeks ago

brms already implements vectorization for many families, but I didn't activate it for beta_binomial yet, presumably because initially, no vectorized version existed. I will fix this quickly.

paul-buerkner commented 3 weeks ago

Should now be fixed :-)