Closed elbamos closed 5 years ago
getting nans in SVI is a pretty common occurrence. typical reasons why things explode include:
in your particular case you might also try a different transformation to the positive real line. i believe constraints.positive
uses the exponential by default; softplus
might provide a more stable alternative
I can try softplus.
Let me just suggest, however, that in a ppl, if getting nan's in a constrained-positive scale parameter is a common occurrence, I would classify that as a bug.
variational inference turns bayesian inference into a stochastic optimization problem. unfortunately, no one has figured out how to robustly solve stochastic optimization problems in full generality. when stan hmc gets stuck in one mode and doesn't explore other modes is that a bug? general bayesian inference is very difficult. pyro is a tool not a solution that claims to robustly do inference for all possible models. for certain problem domains, say certain types of convex optimization, the problem is well understood enough that one can write down more or less robust solutions that consistently work. (approximate) bayesian inference is not there yet. that's why it's an active area of research.
Hi @elbamos, this seems like an issue with constraints.positive
, which is part of PyTorch distributions. I recommend opening a PyTorch issue with a minimal reproducible example that doesn't use Pyro so folks there can look into potential fixes. For that reason I'm going to close this issue.
As @martinjankowiak says, getting variational inference, especially black-box Monte Carlo variational inference, to work reliably can be an art rather than a science (and this difficulty highlights what remarkable achievements the NUTS algorithm and its implementation in Stan are!). VI is also less suitable statistically for some tasks than others, and may have an especially difficult time with time series models.
We have an open issue #1677 about adding a tutorial with miscellaneous Pyro/SVI tips and tricks, but in the meantime, I would recommend looking at some of the more advanced Pyro tutorials, especially the AIR and DMM tutorials, for examples of making VI work with large, complicated models.
@martinjankowiak
when stan hmc gets stuck in one mode and doesn't explore other modes is that a bug?
Its considered a bug in the model. Much of the discussion of the art of Bayesian inference with NUTS concerns how to reparameterize models to support proper exploration of the posterior.
If, on the other hand, I was to define a Stan parameter
as follows: real<lower=0> sigma;
gave it a prior of sigma ~ cauchy(0, 1);
and then used it in obs ~ normal(0, sigma)
, and during inference Stan failed because sigma was nan then, yes, I'm pretty confident that would be considered a bug in Stan.
@eb8680 Thanks; if I'm able to construct an MRE without using Pyro I'll try to do that.
Regarding the tutorials, I would encourage you to produce one on implementing a traditional hierarchical model with SVI. Perhaps one that converts a hierarchical model from MCMC to SVI would be helpful?
Hey @elbamos did you ever get a MRE submitted to torch? I think I am having a similar error...
I did ultimately resolve this, but as it was more than a year ago, I don’t recall how. I do recall that I concluded there were, at the time, impediments to using pyro for serious time series work related to the efficiency of zero-centered parameterizations.
On Mar 26, 2020, at 8:33 AM, Dan Marthaler notifications@github.com wrote:
Hey @elbamos did you ever get a MRE submitted to torch? I think I am having a similar error...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Issue Description
With a parameter set to have a positive constraint, I sometimes get NaNs in the guide.
Code Snippet
My guide includes the following code:
In trying to debug why, when I try to use SVI with my model, I the call to
cholesky
will sometimes throw an LAPACK exception after many iterations, I found this to arise whentheta
in the guide contains a NaN.I'm very confused about how
theta
could possibly get anan
value. This is not something I would have expected to have to worry about, since I have aconstraints.positive
on it.Is this another issue of taking a very large step size through a gradient and ending up with a NaN?
Let me add: I'm very new to SVI. I come from the MCMC world. I find the relationship among guides, parameters, and traditional hierarchical model priors to be rather mysterious. So I'm totally open to the possibility that this is happening because of mistakes on my part.