Open beyondpie opened 4 years ago
importance_resampling and keep_every are part of post-processing that rstan does with the loo
package. See https://github.com/stan-dev/rstan/blob/c88a667015b987440668958d0dcddecdf8fd346c/rstan/rstan/R/stanmodel-class.R#L293
So this is not part of "core" Stan.
@rok-cesnovar Thank you so much for showing the original codes in rstan!
If possible maybe what we should do is move most of the code for these features into the loo package itself. Then rstan and cmdstanr could both access it instead of only rstan. @avehtari @rok-cesnovar What do you think?
@jgabry That would be great! I'm thinking about using rstan to get the importance sampling feature ... From my current experiment, I notice that sometimes, even VI can converge, the results might be not good (maybe a local minimum), but I don't know how to check/revise the results when underfitting. I think importance sampling might be a good way?
BTW, do you have any suggestions on using "meanfield" or "fullrank" for VI. Currently I just use "meanfield". Not sure if "fullrank" is better or may have issues during optimizations such as unstable during optmization?
@rok-cesnovar
In order to do the important sampling, I need the prior probabilities, the likelihoods, and the approximated probabilities (instead of ELBO). Do you have some idea about how I can get the corresponding values? Not sure the meaning of lp__
and lp_approx__
, does them corresponding to the likelihood and the approximated probabilities? If so, then I guess I only need to get the prior values by writing it in the generated block in stan. In this way, I can get the importance sampling purely under cmdstanr
, right ?
If possible maybe what we should do is move most of the code for these features into the loo package itself.
cmdstanr
is using posterior
package that has support for weighted draws. See weight_draws()
and resample_draws()
. The diagnostic (khat/ess for is) could be also included in posterior
.
It seems that cmdstanr
with variational method is returning the needed lp__
and lp_approx__
. So it would be possible to call weight_draws
after reading the draws from the csv, and then also update summary output to show IS based khat and ESS instead of MCMC based Rhat and ESS. Whether resampling is done, should be an option so that it is possible to investigate also the non-resampled draws.
For additional information why the current advi implementation in Stan has variability in performance see https://arxiv.org/abs/2009.00666
@avehtari
Thank you very much for your response! I'm reading the paper you recommend.
It seems that
cmdstanr
with variational method is returning the neededlp__
andlp_approx__
.
- In my stan codes, I use
y ~ distr(param)
way. From the stan manuscript, stan will only use the likelihood (up to an additive constant). Solp__
would be the log of total probability (including prior) up to an additive constant, right ?- What's the meaning of
lp_approx__
in variational inference? I didn't find the definition oflp_approx__
. Is this just the log of the probabilities from the approximated distribution, right?- If I use
generated block
in variational inference, are the values evaluated after the optimization stage (I mean after running the stochastic optimization on ELBO), which is just the corresponding sample values I get in VI ?Whether resampling is done, should be an option so that it is possible to investigate also the non-resampled draws.
I don't understand this sentence. You mean I should also investigate the results from VI using MCMC based Rhat and ESS when I choose not to use the IS based khat and ESS for the resampled samples?
Thank you very much!
So lp__ would be the log of total probability (including prior) up to an additive constant, right ?
Yes, and that is sufficient for the importance sampling used.
What's the meaning of lp_approx__ in variational inference? I didn't find the definition of lp_approx__. Is this just the log of the probabilities from the approximated distribution, right?
log densities. See http://proceedings.mlr.press/v80/yao18a.html
If I use generated block in variational inference, are the values evaluated after the optimization stage (I mean after running the stochastic optimization on ELBO), which is just the corresponding sample values I get in VI ?
The generated quantities is computed using the draws from the VI approximation.
I don't understand this sentence. You mean I should also investigate the results from VI using MCMC based Rhat and ESS when I choose not to use the IS based khat and ESS for the resampled samples?
That sentence was directed to the developers of cmdstanr, I hope they do understand it. You can ignore that sentence.
@jgabry A quick question: which parameter in VI controls the mini-batch size? From the document, VI in stan is based on stochastic gradient ascend. Do you know the default mini-batch size during optmization?
Thanks!
A quick question: which parameter in VI controls the mini-batch size? From the document, VI in stan is based on stochastic gradient ascend. Do you know the default mini-batch size during optmization?
No mini-batches. It is unfortunate that stochasticity in stochastic gradient descent is so strongly associated with mini-batching while the stochasticity can be due to other reasons, too. In Stan ADVI stochasticity is due to Monte Carlo estimation of the gradient (with all data). There are non-Stan ADVI implementation examples which have stochasticity coming both from mini-batching and Monte Carlo estimation of the gradient. Mini-batching assumes factorizing likelihood, but Stan programs can have non-factorizing likelihoods and thus it's non-trivial to implement mini-batching.
If I am understanding this correctly, everything required for this is already available in cmdstanr and the only remaining issue is that posterior would show IS based khat and ESS?
If I am understanding this correctly, everything required for this is already available in cmdstanr and the only remaining issue is that posterior would show IS based khat and ESS?
cmdstanr
has what is needed, but it would be useful if cmdstanr
would add the metacolumn .log_weight
to the posterior object (compute from lp__ and lp_approx__)
Right now posterior
has the support for weighted draws and re-sampling, but it doesn't yet have IS diagnostics, but we have discussed adding them to posterior
so that it could support appropriate summarize_draws
. What to display can be copied from rstan
. Ping @paul-buerkner @jgabry.
Thanks for the clarification!
@avehtari @rok-cesnovar Thanks for your following comments. I've not implemented this part yet. But I think for me it seems straight forward:
log_weight <- lp__ - lp_approx__
(up to a constant)loo
package to do PSIS, and get khat and ESS. Hope my understanding is right.
I set up a hierarchical Bayesian model, which has thousands of parameters. The khat evaluation is based on the joint distribution, so all the parameters will share the same khat value, which is helpful for me to detect the overall model fitting based on variational inference, but not that helpful for me to evaluate each parameter (due to the difficulty on getting the marginal posterior distribution for each parameter). But I'm still glad to implement it to evaluate my model.
PSIS might be helpful further to correct the bias. I plan to try it even khat shows a big number ('>0.7'). From the paper, Yao eta la., 2018, it seems OK.
I will use a much smaller threshold (like 0.0001) and a relatively small learning rate (eta as 0.2 around) as the stopping rule in my model for STAN VI to approximate the better stopping rule defined in Dhaka et al., 2020.
@avehtari do you have a plan to update the variational inference in STAN based on Dhaka et al., 2020 paper? It's really great!
Under this link: https://mc-stan.org/cmdstanr/reference/model-method-variational.html , there are no parameters for "important_resampling", "keep every" in rstan (https://mc-stan.org/rstan/reference/stanmodel-method-vb.html) ?