Memory overrun with and rsession crash with `add_fitted_draws()`

mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)

http://mjskay.github.io/tidybayes

GNU General Public License v3.0

712 stars 59 forks source link

Memory overrun with and rsession crash with `add_fitted_draws()` #266

Closed JCruk closed 4 years ago

JCruk commented 4 years ago

I have been running some ordinal models with brms across multiple chains and iterations to land of "stable" estimates where I am for an ESS of 10,000.

If I attempt to add_fitted_draws with these model, 32gb of physical RAM and 32gb of swap are consumed and my resssion crashes.

I have tried limited the number of participants in the data frame (from 28 down to 4) set re_formula to NA, and set n to 1. But, all memory is still consumed until R crashes.

For example:

Fits <- Data %>%
        select(Subject, Var1, Var2, Var3)
        distinct() %>%
        filter(Subject %in% c(11,21,
                             8,18)) %>%
        droplevels() %>%
        add_fitted_draws(Large_Model,
                         category = "Rating",
                         value = "Probability",
                         re_formula = NA,
                         n = 1)

If I re-fit the models with a fraction of the chains and iterations, I am able to add_fitted_draws() without much issue.

Is there an argument or syntax I might be missing to "scale" adding fitted draws to a larger model?

JCruk commented 4 years ago

After some excellent discussions at StanCon, I found a couple of viable workarounds to my issue.

The first is to use the fantastic shredder package:

Fits <- Data %>%
        select(Subject, Var1, Var2, Var3) %>%
        distinct() %>%
        add_fitted_draws(Large_Model %>%
        stan_retain(2:4) %>%
        stan_thin_frac(0.5),
                         category = "Rating",
                         value = "Probability",
                         re_formula = NA,
                         n = 100)

This subsets the large model sufficiently to let add_fitted_draws do its thing without consuming all available memory.

The second approach is to call predict_epred.brms directly for the expected probabilities:

Fits <- posterior_epred(Large_Model, 
                             Data %>%
                                 select(Subject, Var1, Var2, Var3) %>%
                                 distinct(),
                             re_formula = NULL,
                             nsamples = 100)

This approach runs very quickly as it is only 100 total samples and not per chain and per iteration. I can then process the resulting multidimensional array manually to get the Fits into long format with a row for each sample.

mjskay commented 4 years ago

Great, glad that worked!

Re: the second solution (using posterior_epred() directly), did you try stuffing the output of posterior_epred() into tidybayes::add_draws()? It should simplify the second step of turning the matrix into long-format into a single step. I'd be curious if it is helpful.

JCruk commented 4 years ago

I still have to do some wrangling before add_draws can use the posterior_epred result. My model is categorical (ordinal), so posterior_epred returns a 3 dimensional array: draws x observations x response_choices.

add_draws only accepts a 2D array so I have manually reduce dimensions first.

mjskay commented 4 years ago

Ah annoying. Hrm. Well, no good "automatic" solution to this probably until I rewrite the internals on top of posterior. Glad you've got some viable workarounds in the meantime --- thanks for circling back around!