NaNs and constraints [bug, maybe?] [discussion, possibly?]

elbamos commented 5 years ago

Issue Description

With a parameter set to have a positive constraint, I sometimes get NaNs in the guide.

Code Snippet

My guide includes the following code:

theta = .param("theta", mu.new_full((M,), scale/2.0).requires_grad_(), constraint=constraints.positive)
# Prior on theta 
.sample("theta", dist.HalfCauchy(mu.new_full((M,), scale)).to_event())
omega_temp = theta.diag().cholesky()

In trying to debug why, when I try to use SVI with my model, I the call to cholesky will sometimes throw an LAPACK exception after many iterations, I found this to arise when theta in the guide contains a NaN.

I'm very confused about how theta could possibly get a nan value. This is not something I would have expected to have to worry about, since I have a constraints.positive on it.

Is this another issue of taking a very large step size through a gradient and ending up with a NaN?

Let me add: I'm very new to SVI. I come from the MCMC world. I find the relationship among guides, parameters, and traditional hierarchical model priors to be rather mysterious. So I'm totally open to the possibility that this is happening because of mistakes on my part.

martinjankowiak commented 5 years ago

getting nans in SVI is a pretty common occurrence. typical reasons why things explode include:

bad hyperparameters for optimizer (e.g. learning rate too high, too little momentum/smoothing of gradients, etc.)
bad initialization of variational parameters (e.g. variances too high)

in your particular case you might also try a different transformation to the positive real line. i believe constraints.positive uses the exponential by default; softplus might provide a more stable alternative

elbamos commented 5 years ago

I can try softplus.

Let me just suggest, however, that in a ppl, if getting nan's in a constrained-positive scale parameter is a common occurrence, I would classify that as a bug.

martinjankowiak commented 5 years ago

variational inference turns bayesian inference into a stochastic optimization problem. unfortunately, no one has figured out how to robustly solve stochastic optimization problems in full generality. when stan hmc gets stuck in one mode and doesn't explore other modes is that a bug? general bayesian inference is very difficult. pyro is a tool not a solution that claims to robustly do inference for all possible models. for certain problem domains, say certain types of convex optimization, the problem is well understood enough that one can write down more or less robust solutions that consistently work. (approximate) bayesian inference is not there yet. that's why it's an active area of research.

eb8680 commented 5 years ago

Hi @elbamos, this seems like an issue with constraints.positive, which is part of PyTorch distributions. I recommend opening a PyTorch issue with a minimal reproducible example that doesn't use Pyro so folks there can look into potential fixes. For that reason I'm going to close this issue.

As @martinjankowiak says, getting variational inference, especially black-box Monte Carlo variational inference, to work reliably can be an art rather than a science (and this difficulty highlights what remarkable achievements the NUTS algorithm and its implementation in Stan are!). VI is also less suitable statistically for some tasks than others, and may have an especially difficult time with time series models.

We have an open issue #1677 about adding a tutorial with miscellaneous Pyro/SVI tips and tricks, but in the meantime, I would recommend looking at some of the more advanced Pyro tutorials, especially the AIR and DMM tutorials, for examples of making VI work with large, complicated models.

elbamos commented 5 years ago

@martinjankowiak

when stan hmc gets stuck in one mode and doesn't explore other modes is that a bug?

Its considered a bug in the model. Much of the discussion of the art of Bayesian inference with NUTS concerns how to reparameterize models to support proper exploration of the posterior.

If, on the other hand, I was to define a Stan parameter as follows: real<lower=0> sigma; gave it a prior of sigma ~ cauchy(0, 1); and then used it in obs ~ normal(0, sigma), and during inference Stan failed because sigma was nan then, yes, I'm pretty confident that would be considered a bug in Stan.

@eb8680 Thanks; if I'm able to construct an MRE without using Pyro I'll try to do that.

Regarding the tutorials, I would encourage you to produce one on implementing a traditional hierarchical model with SVI. Perhaps one that converts a hierarchical model from MCMC to SVI would be helpful?

mathDR commented 4 years ago

Hey @elbamos did you ever get a MRE submitted to torch? I think I am having a similar error...

elbamos commented 4 years ago

I did ultimately resolve this, but as it was more than a year ago, I don’t recall how. I do recall that I concluded there were, at the time, impediments to using pyro for serious time series work related to the efficiency of zero-centered parameterizations.

On Mar 26, 2020, at 8:33 AM, Dan Marthaler notifications@github.com wrote:

Hey @elbamos did you ever get a MRE submitted to torch? I think I am having a similar error...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

pyro-ppl / pyro

NaNs and constraints [bug, maybe?] [discussion, possibly?] #1755

Issue Description

Code Snippet