stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.56k stars 366 forks source link

Underflow in psis_weights leads to uniform sampling from multi-pathfinder draws #3241

Closed avehtari closed 7 months ago

avehtari commented 8 months ago

Summary:

Underflow in psis_weights computation produces all 0 weights which are normalized by dividing by 0, which Boost discerete_distribution happily accepts as equal weights

Description:

Lines 278-280 in psis.hpp https://github.com/stan-dev/stan/blob/667d5b2d7c5ea1072011919a1acd76a9d3061d01/src/stan/services/pathfinder/psis.hpp#L278C1-L280C42 are

  auto max_adj = (llr_weights + max_log_ratio).eval();
  auto max_adj_exp = max_adj.exp();
  return max_adj_exp / max_adj_exp.sum();

Here "+ max_log_ratio" is unnecessary as the last line does the normalization, but that addition makes it likely that exp underflows and all values are 0. For some reason, the division by 0 (sum of 0's is 0) is not causing error, and Boost discrete_distribution is happy to consider them equal weights, and produces draws uniformly. "+ max_log_ratio" should be dropped.

Reproducible Steps:

I don't have time to make a minimal reproducible example, but I did found out this by running the birthday example https://users.aalto.fi/~ave/casestudies/Birthdays/birthdays.html.

After the first model has been compiled and data list created

model1 <- cmdstan_model(stan_file = root("Birthdays", "gpbf1.stan"),
                        include_paths = root("Birthdays"))
Data to be passed to Stan

standata1 <- list(x=data$id,
                  y=log(data$births_relative100),
                  N=length(data$id),
                  c_f1=1.5, # factor c of basis functions for GP for f1
                  M_f1=20)  # number of basis functions for GP for f1
``
then run

pth1 <- model1$pathfinder(data = standata1, init=0.1, num_paths=10, single_path_draws=40, history_size=50, max_lbfgs_iters=100) lw=sort(extract_variable(mutate_variables(pth1$draws(),lw=lp__-lp_approx__), variable=c('lw'))); w1=exp(lw) w2=exp(lw-max(lw))



Now w1 is all zeros, while w2 is all non-zeros. In w2 97.6% of the smallest weights have cumulative probability of 0.1% which shows that the sampling has not happened according to the weights.

#### Additional Information:

It would be nice to have an option to control 1) whether resampling is done at all, 2) whether resampling is done with replacement or without replacement. These would be useful for diagnosing the approximation and doing further research on using Pathfinder for initialization of MCMC. I'll make an separate issue about this.