Multiple inits for multi-path Pathfinder?

colinorourke commented 7 months ago

Alluded to in #874, I've noticed that if you try to set initial values to the Pathfinder algorithm, in particular the multi-path variety of this algorithm, cmdstanr currently doesn't allow the paths to begin from different initial positions. The requirement for the Pathfinder algorithm seems to be that the inits are a list containing just a single initial value. While this makes sense for the other variational algorithms, for Pathfinder this means that each L-BFGS starts from the same place, which doesn't seem ideal. Maybe this is due to some underlying restriction of cmdstan itself?

Here's an example (I've simplified some of the output to remove warnings from Stan about gradients & Pareto k values):

# Load necessary packages
library(cmdstanr)
#> This is cmdstanr version 0.7.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: C:/Users/corour/Documents/.cmdstan/cmdstan-2.34.0
#> - CmdStan version: 2.34.0
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.20.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').

# Setup a model. This is just an example from brms.
prior1 = prior(normal(0, 10), class = b) +
  prior(cauchy(0, 2), class = sd)
stan_code = make_stancode(
  count ~ zAge + zBase * Trt + (1 | patient),
  data = epilepsy,
  family = poisson(),
  prior = prior1
)
stan_data = make_standata(
  count ~ zAge + zBase * Trt + (1 | patient),
  data = epilepsy,
  family = poisson(),
  prior = prior1
)
mod = cmdstan_model(write_stan_file(stan_code))

# Create a function that generates some initial values. It will generate
# different sets of initial values depending on the value of `chain_id`
init_fn = function(chain_id) {
  old_rng_state = get(".Random.seed", envir = globalenv())
  on.exit(assign(".Random.seed", value = old_rng_state, envir = globalenv()))
  new_seed = rngtools::RNGseq(n = chain_id + 1L, seed = 1313L)[[chain_id]]
  assign(".Random.seed", value = new_seed, envir = globalenv())
  b = runif(4L,-2, 2) #vector[Kc]
  Intercept = runif(1L,-2, 2)
  sd_1 = runif(1L, 0, 2) #vector<lower=0>[M_1]
  z_1 = array(runif(1 * 59,-2, 2), dim = c(1, 59)) #array[M_1] vector[N_1]
  list(
    b = b,
    Intercept = Intercept,
    sd_1 = sd_1,
    z_1 = z_1
  )
}

# Create initial values for cmdstanr in the "list-of-lists" format
init_list_4 = lapply(1:4, \(i) init_fn(i))
init_list_1 = list(init_fn(1))

# Try to run Pathfinder with num_paths=4 using the "list-of-lists" style
# of setting initial values. This just produces an error.
mod_w_init_list = mod$pathfinder(
  data = stan_data,
  seed = 123,
  init = init_list_4,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Error: 'init' has the wrong length. See documentation of 'init' argument.

# Run Pathfinder with num_paths=4 but using just a list with a single
# initialization value. This runs but re-uses the same init for each path
mod_w_init_list = mod$pathfinder(
  data = stan_data,
  seed = 123,
  init = init_list_1,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Path [1] :Initial log joint density = -16938.506748 
#> Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.110e+03 -1.110e+03                   
#> Path [1] :Best Iter: [76] ELBO (-785.130279) evaluations: (4851) 
#> Path [2] :Initial log joint density = -16938.506748 
#> Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.135e+03 -1.135e+03                   
#> Path [2] :Best Iter: [77] ELBO (-789.101802) evaluations: (4851) 
#> Path [3] :Initial log joint density = -16938.506748 
#> Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.245e+03 -1.245e+03                   
#> Path [3] :Best Iter: [76] ELBO (-787.813011) evaluations: (4851) 
#> Path [4] :Initial log joint density = -16938.506748 
#> Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.116e+03 -1.116e+03                   
#> Path [4] :Best Iter: [76] ELBO (-788.891651) evaluations: (4851) 
#> Total log probability function evaluations:23304 
#> Finished in  1.8 seconds.

# Run using the function-style of specifying inits. This seems to just repeat
# the single set of inits for for each of the 4 paths.
mod_w_inits = mod$pathfinder(
  data = stan_data,
  seed = 123,
  init = init_fn,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Path [1] :Initial log joint density = -16938.506748
#> Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.110e+03 -1.110e+03                   
#> Path [1] :Best Iter: [76] ELBO (-785.130279) evaluations: (4851) 
#> Path [2] :Initial log joint density = -16938.506748
#> Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.135e+03 -1.135e+03                   
#> Path [2] :Best Iter: [77] ELBO (-789.101802) evaluations: (4851) 
#> Path [3] :Initial log joint density = -16938.506748 
#> Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.245e+03 -1.245e+03                   
#> Path [3] :Best Iter: [76] ELBO (-787.813011) evaluations: (4851) 
#> Path [4] :Initial log joint density = -16938.506748 
#> Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.116e+03 -1.116e+03                   
#> Path [4] :Best Iter: [76] ELBO (-788.891651) evaluations: (4851) 
#> Total log probability function evaluations:23304 
#> Finished in  1.8 seconds.

# Run using the default random inits. Different inits are used for each
# Pathfinder path.
mod_wo_inits = mod$pathfinder(
  data = stan_data,
  seed = 123,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Path [1] :Initial log joint density = -5218.225137 
#> Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             399      -6.539e+02      8.264e-05   4.839e-02    9.763e-01  9.763e-01      9976 -1.649e+05 -1.649e+05                   
#> Path [1] :Best Iter: [54] ELBO (-735.976769) evaluations: (9976) 
#> Path [2] :Initial log joint density = -6622.413033 
#> Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             308      -6.539e+02      8.655e-05   7.312e-02    1.000e+00  1.000e+00      7701 -1.418e+10 -1.418e+10                   
#> Path [2] :Best Iter: [50] ELBO (-745.757922) evaluations: (7701) 
#> Path [3] :Initial log joint density = -12283.643776 
#> Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             310      -6.539e+02      1.200e-04   1.092e-01    8.474e-01  8.474e-01      7751 -6.558e+04 -6.558e+04                   
#> Path [3] :Best Iter: [58] ELBO (-743.025505) evaluations: (7751) 
#> Path [4] :Initial log joint density = -95375.955585 
#> Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             193      -6.539e+02      3.622e-04   8.454e-02    1.000e+00  1.000e+00      4826 -2.231e+03 -2.231e+03                   
#> Path [4] :Best Iter: [82] ELBO (-786.378632) evaluations: (4826) 
#> Total log probability function evaluations:34154 
#> Finished in  2.6 seconds.

^{Created on 2024-01-22 with reprex v2.0.2}

R version info:

 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/Los_Angeles
 date     2024-01-22
 rstudio  2023.06.1+524 Mountain Hydrangea (desktop)
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

avehtari commented 7 months ago

I was also annoyed the Pathfinder expects a list and not list of lists, but I forgot to make an issue. I agree, it should be possible to give a list of lists for multi-pathfinder

SteveBronder commented 6 months ago

I'm fixing that in the PR to have inits as pathfinder fits

avehtari commented 6 months ago

Can you also allow list of lists, as for other inference methods?

SteveBronder commented 6 months ago

Ooof okay this is going to take me a second. So I wrote it originally to use some internals that thinks parallelization is only done at the R level and that each init goes to a separate process started at the R level . I'm going to chat with @jgabry tmrw about how that all works and also what we can do to stop doing the parallelization to happen at the cmdstan level for everything since we have that now

avehtari commented 3 months ago

Has this been fixed by #937 ?

andrjohns commented 3 months ago

@SteveBronder - just to check something with Pathfinder and inits - when multiple init files are present (one for each path), is it normal for the output to only show the first init file?

For example, with num_paths=4 and four distinct init files, the output shows:

# init = /var/folders/0h/9q9frlm906s6cwft_rp92dw80000gn/T/RtmpToGNxD/init-71038ec7ec4_1.json

SteveBronder commented 3 months ago

Yes we only print the first one. I do think we should change that so the actual result is more clear. We could also let users pass in a comma separated list of file names which I think would be nicer

stan-dev / cmdstanr

Multiple inits for multi-path Pathfinder? #907