saudiwin commented 2 years ago

Hi vincent -

While turning off additional performance metrics helps, it's still quite slow to calculate even basic metrics like RMSE. Some thoughts on what could help:

Implement parallel processing of models -- this seems fairly straightforward if multiple models are being passed to the modelsummary function.
Limit the number of draws used in computation. RMSE will take forever if all of the draws from a model are used. Limiting the number of draws to 100 or something could go a long way to reducing computation time.
Turn off calculated performance metrics and use ones already present in brms models. These are primarily convergence statistics like Rhat, but I think could still be useful and would not require further computation.
Some performance metrics can be parallelized, like RMSE. However that seems beyond the scope of the project.

1/#2 would no doubt require work. Perhaps the easiest thing is to add #3 as an option if people have models that are just too big. I'd be happy to help with coding as the package is quite useful.

vincentarelbundock commented 2 years ago

Hey @saudiwin !

I'm on my way out, but a few quick comments:

I would probably have to be the one implementing parallelism, but for future reference, the easiest spot would likely be this loop: https://github.com/vincentarelbundock/modelsummary/blob/main/R/modelsummary.R#L817
My guess is that RMSE is slow because it takes a while to extract residuals. Is there a draws argument we can pass to stats::residuals?
Independent of the other points, it might be interesting to add the most important of these built-in statistics. If you let me know exactly what these are and where they are in the model object, we can open a discussion in the performance package repo (which I use behind the scene to extract metrics from these models). Alternatively, we could do the same thing I do for fixest, which is create a method that returns glance-like one-row data frame. See here: https://github.com/vincentarelbundock/modelsummary/blob/main/R/methods_fixest.R#L85
Yes, let me think about how best to do this. One option would be to avoid computing metrics altogether when gof_omit=".*". Another would be to add a new metrics = "none" option.

saudiwin commented 2 years ago

Re: 1, yes that looks fairly straightforward.

Model objects based on Stan like brms don't have residuals, they have to be calculated by taking each draw of from the posterior, getting a prediction, then subtracting from the outcome. This would be much faster if it was simply done with the mean posterior value for each parameter, but Bayesians like their uncertainty :). That would be another way to do this and probably acceptable to most users would be just use the posterior median value, but that would probably involve upstream work.
Sure. I think creating a custom method for rstanarm/brms objects would work as it wouldn't be too difficult. I could help with that.

saudiwin commented 2 years ago

So for example with R2, the peformance package (and all other functions) are calculating R2 for all posterior draws, potentially thousands, then reporting the mean R2 and the quantile interval. That is cool, of course, and Bayesians like to show off, but at the end of the day no one really cares about your R2 interval.

vincentarelbundock commented 2 years ago

I just pushed a commit with parallel computation and two alternatives to exclude all GOF and speed up computations: metrics="none" and gof_map=NA.

Would you be able to try parallelism with a list of large bayesian models? I show an example below, but it looks like there is a lot of overhead that must be paid the first time we parallelize. The second run is much faster, but still no more than single core. I wonder what would happen with bigger models. It it's still terrible, then there might be a problem with my parallelization concept.

library(modelsummary)
library(bench)

url <- "https://github.com/vincentarelbundock/modelarchive/raw/main/data/brms_vdem.rds"
tmp <- tempfile()
download.file(url, tmp)
mod <- readRDS(tmp)
models <- rep(list(mod), 4)

Parallel

Single core:

system.time({
    tab <- modelsummary(models, statistic = "conf.int", output = "data.frame")
})
#>    user  system elapsed 
#>   0.418   0.061   6.567

Multicore:

library(future.apply)
plan("multisession")

# 1st run takes a long time
system.time({
    tab <- modelsummary(models, statistic = "conf.int", output = "data.frame")
})
#>    user  system elapsed 
#>   0.303   0.062  25.343

# 2nd run is much faster, but still no more than single core.
system.time({
    tab <- modelsummary(models, statistic = "conf.int", output = "data.frame")
})
#>    user  system elapsed 
#>   0.405   0.061   6.769

Omit GOF

modelsummary(mod, statistic = "conf.int", output = "markdown", gof_map = NA)

	Model 1
b_Intercept	-2.478
	[-3.027, -1.878]
b_phi_Intercept	2.383
	[1.485, 3.365]
b_party_autonomyTRUE	1.037
	[0.659, 1.378]
b_civil_liberties	3.490
	[2.869, 4.173]
sd_region__Intercept	0.438
	[0.184, 0.941]
sd_region__phi_Intercept	0.881
	[0.310, 1.878]


results <- bench::mark(
modelsummary(mod, statistic = "conf.int", output = "data.frame"),
modelsummary(mod, statistic = "conf.int", metrics = "none", output = "data.frame"),
modelsummary(mod, statistic = "conf.int", gof_map = NA, output = "data.frame"),
check = FALSE, iterations = 3)

results[c(1, 3, 5)]
#> # A tibble: 3 × 3
#>   expression                                                                           median mem_alloc
#>   <bch:expr>                                                                         <bch:tm> <bch:byt>
#> 1 modelsummary(mod, statistic = "conf.int", output = "data.frame")                      4.63s     996MB
#> 2 modelsummary(mod, statistic = "conf.int", metrics = "none", output = "data.frame")    1.09s      83MB
#> 3 modelsummary(mod, statistic = "conf.int", gof_map = NA, output = "data.frame")        1.08s      83MB

saudiwin commented 2 years ago

I can confirm the "none" option works! I had been trying to create this table with only "RMSE" specified and it was taking several hours. This time it took <1 min.

I'll try out the parallel option; that would still be good to have for Bayesian models given how time intensive the metrics are. The metrics are definitely useful; it's just difficult to have them be defaults.

saudiwin commented 2 years ago

This is fantastic! I'm going to add you to the paper acknowledgements :)

vincentarelbundock commented 2 years ago

Nice!

If I wanted to simulate data and estimate a brms model that would take about 2 minutes to summarize with metrics, what do you think those data+model would look like?

saudiwin commented 2 years ago

You don't really need a simulation, you can just vary the number of draws from the sampler. Try this:

library(brms)
library(modelsummary)

# using provided dataset epilepsy

ndraws <- seq(2, 1000,by=100)

prior1 <- prior(normal(0,10), class = b) +
  prior(cauchy(0,2), class = sd)

all_mods <- lapply(ndraws, function(d) {

  brm(count ~ zAge + zBase * Trt + (1|patient),
      data = epilepsy, family = poisson(), prior = prior1,
      iter=d,
      backend="cmdstanr")

})

sum_mods <- lapply(all_mods, function(m) 

  system.time(modelsummary(m,statistic = "conf.int"))
  )

plot(1:length(ndraws), sapply(sum_mods, function(s) s[3]))

sample_plot

saudiwin commented 2 years ago

Note the roughly linear scaling.

vincentarelbundock commented 2 years ago

So, it looks like parallelism works, even if there is a fair bit of overhead:

library(brms)
library(modelsummary)
library(future.apply)

ndraws <- seq(2, 10000, by = 100)
prior1 <- prior(normal(0, 10), class = b) +
    prior(cauchy(0, 2), class = sd)
mod <- brm(count ~ zAge + zBase * Trt + (1 | patient),
    data = epilepsy, family = poisson(), prior = prior1,
    iter = 5e4,
    cores = 4,
    backend = "cmdstanr")
models <- rep(list(mod), 4)

# single core
plan("sequential")
system.time({
    modelsummary(models, output = "dataframe", statistic = "conf.int")
})

##    user  system elapsed 
## 263.063  14.976 278.672

# multi core
plan("multisession")
system.time({
    modelsummary(models, output = "dataframe", statistic = "conf.int")
})

##    user  system elapsed 
##   4.137   2.263 118.189

vincentarelbundock commented 2 years ago

TODO before closing this Issue:

[x] Add a new section on parallelism in the docs. Call library(future.apply) and plan("multisession") and mention that we parallelize at the model-level, so there won't be any gains if there is just one model. This should be added as a section to this document: https://github.com/vincentarelbundock/modelsummary/blob/main/man-roxygen/modelsummary_details.R
[x] Add a discussion of the new options to the performance section of the FAQ: metrics="none", gof_map=NA and parallel: https://github.com/vincentarelbundock/modelsummary/blob/main/vignettes/modelsummary.Rmd#L1221
[ ] Create a methods_brms.R file with a glance_custom_internal.brmsfit() method that extracts R-hat and other easily accessible and important statistics.

vincentarelbundock commented 2 years ago

I have added a note on parallelism in the docs and in the FAQ.

I am quite unlikely to write a glance_custom_internal.brmsfit() method myself, so I'll close this issue for now. If you want to add R-hat et al.. All that is needed is a function with the name above in the methods_brms.R file, which accepts a model and returns a glance-like one-row data.frame, where each column represents one distinct statistic.

If you want to write one, I'd be happy to review a Pull Request, but no pressure at all. I don't think it's a big deal, and with parallelism and gof_map=NA, I think we're pretty much set for this particular issue.

vincentarelbundock / modelsummary

Performance metrics still slow for bayesian models #466

1/#2 would no doubt require work. Perhaps the easiest thing is to add #3 as an option if people have models that are just too big. I'd be happy to help with coding as the package is quite useful.

Parallel

Omit GOF