stan-dev / loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
https://mc-stan.org/loo
Other
148 stars 34 forks source link

Improved mcse_elpd_loo #245

Open avehtari opened 5 months ago

avehtari commented 5 months ago

Currently, mcse_elpd_loo is computed by assuming the pointwise mcse's to be independent. As the same MC draws are used for each pointwise PSIS weights, the weights are correlated and this correlation can sometimes to be high enough that it would be useful to take into account. The extension of the formulas is straight forward, but as the current approach can compute pointwise mcse's parallely, extra care is needed to check that all different functions would work given the change, or at least information would be provided whether independence assumption has been used or not. I'll prepare a small case study to illustrate computation for improved mcse.

jgabry commented 5 months ago

Sounds good!

avehtari commented 5 months ago

I have now the equations, but... Currently, pointwise mcse_elpd's are computed in pointwise_loo_calcs() and stored as one column in pointwise array. mcse_loo() uses independence assumption to estimate the total mcse. The improved computation would take into account the correlations. Storing the correlations would take memory, so it would be better to store only the improved total mcse. As the pointwise mcse's stay the same in the improved approach, I propose that we keep the current pointwise_loo_calcs() as is, and for example, in loo.matrix() after calling pointwise <- pointwise_loo_calcs(x, psis_out), call a new function mcse_total_elpd() to compute the improved mcse using the correlations, too, and store it in importance_sampling_loo object. Right now I'm not sure what would be the best way to store that additional scalar value. Suggestions?

jgabry commented 5 months ago

There might be a better option, but what about calling mcse_total_elpd inside importance_sampling_loo_object and putting the resulting scalar in the diagnostics component? Currently I think diagnostics has vectors of pareto k, n_eff, and r_eff values but it could also have a scalar mcse included I guess.

avehtari commented 5 months ago

I have this working for loo.array and loo.matrix, but it's more complicated for loo.function. Currently llfun is called once per observation, but the computed ll is not stored. When computing the covariance needed for total mcse, we need to know ll for two observations for each off-diagonal element. There is no way to do this memory and computation efficiently at the same time. I can still include mcse_total_elpd function, which someone could call if they care, but for consistency I would keep the default behavior for loo.array and loo.matrix as now.

In the roaches example for the rstanarm negative binomial model, we get

> mcse_loo(loo(stan_glmnb))
[1] 0.127
> mcse_total_elpd(log_lik(stan_glmnb), loo(stan_glmnb, save_psis=TRUE)$psis_object)
[1] 0.135

It is possible that the difference between the current and improved is bigger epsecially if there are many khats between 0.5 and 0.7.

No need to get in the next release