tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.24k stars 1.1k forks source link

Windowed HMC and CAR model instability #1330

Open chrism0dwk opened 3 years ago

chrism0dwk commented 3 years ago

Hi All, I've been testing out the new experimental windowed_adaptive_hmc function with a spatial conditional autoregressive model I'm working on. MRE is here with a comparison against a PyMC3/Theano implementation.

In the TFP version, I'm experiencing instability with the required Cholesky decomposition (InvalidArgumentError: Input matrix is not invertible.) . Although the model is very basic and needs improvement, this issue doesn't occur in the PyMC3 version and I get relatively sane inference.

I thought I'd open an issue, as this may well be related to the fixed num_adaptation_steps versus fixed total integration time argument. With the former strategy, bad choices of step size early on in the adaptation are likely to sling the leapfrog integrator into outer space, whereas all that happens with the latter strategy is that the number of steps increases and the algorithm slows.

Any thoughts?

Chris

davmre commented 3 years ago

Hi Chris,

A common source of Cholesky errors is that TensorFlow uses float32 by default, which is way less stable than float64 for linear-algebra-y computations like inverses and Cholesky decompositions. I'm not sure about PyMC's internal computations, but at least the results in your notebook are float64, so it's possible that the stability difference is just a result of double- vs single-precision computation.

It looks like we have a bug in windowed_adaptive_hmc's dtype handling: I get errors when I pass in a float64 model (built by adding dtype=tf.float64 to all of your tf.constant and tf.convert_to_tensor calls), so I can't test this hypothesis directly, but it's pretty plausible IMHO.

From a 'getting the model to run' perspective, it looks like it helps a lot to (1) rewrite without the matrix inverse (we've recently added a MultivariateNormalPrecisionFactorLinearOperator distribution to express MVNs directly in terms of the precision, so you don't have to explicitly invert the covariance), and (2) add some tiny padding to the scale factor for numerical stability.

If I replace your lines:

cov = tf.linalg.inv(precision)
scale = tf.math.sqrt(sigmasq) * tf.linalg.cholesky(cov, name="cov_cholesky")
return tfd.MultivariateNormalTriL(
  loc=tf.constant(0.0),
  scale_tril=scale,
)

by the equivalent (up to sigmasq padding) line

return tfp.experimental.distributions.MultivariateNormalPrecisionFactorLinearOperator(
            loc=tf.constant(0.0),
            precision_factor=tf.linalg.LinearOperatorLowerTriangular(
                tf.linalg.cholesky(precision / (sigmasq + 1e-6))))

then this seems to be sufficient to prevent the immediate Cholesky decomposition error, even in float32.

(I'll defer to others on the specifics of windowed_adaptive_hmc vs PyMC's adaptation scheme; I don't know enough to really comment)

Dave

chrism0dwk commented 3 years ago

Hi Dave,

Thanks for picking this up. I tried out your solution above, but for me on (a new) Colab I still get the failed matrix inversion, again from the tf.linalg.cholesky op...

The MRE above came out of a much more complex model using float64 through. I noticed that

scale = tf.linalg.cholesky(sigmasq * cov)

failed, whereas

scale = tf.sqrt(sigmasq) * tf.linalg.cholesky(cov)

worked fine. I guessed that was because the large swings in sigmasq experienced in the early stages of the HMC warmup weren't fouling up the Cholesky factorisation. However, as you've seen this isn't a universal solution and didn't work in the MRE.

Will keep digging...

Chris

brianwa84 commented 3 years ago

Could you get away with a cast to f64 before the cholesky, and cast back afterward? Also note there is a new tfp.experimental.linalg.no_pivot_ldl which might be able to drop in for the cholesky somehow. (It's been helpful in another context for more robust GPs.)

On Thu, May 6, 2021 at 5:12 PM Chris Jewell @.***> wrote:

Hi Dave,

Thanks for picking this up. I tried out your solution above, but for me on (a new) Colab https://colab.research.google.com/drive/1LYmCDFwgKR_Z6kJ31V5nYBlde77s0Qaz?usp=sharing I still get the failed matrix inversion, again from the tf.linalg.cholesky op...

The MRE above came out of a much more complex model in which I noticed that

scale = tf.linalg.cholesky(sigmasq * cov)

failed, whereas

scale = tf.sqrt(sigmasq) * tf.linalg.cholesky(cov)

worked fine. I guessed that was because the large swings in sigmasq experienced in the early stages of the HMC warmup weren't fouling up the Cholesky factorisation. However, as you've seen this isn't a universal solution and didn't work in the MRE.

Will keep digging...

Chris

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1330#issuecomment-833870009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI22FLMMHW4JOAHFYF3TMMA27ANCNFSM44G5OT6Q .

ColCarroll commented 3 years ago

I've got a fix coming in for the float64 handling in windowed sampling, and it looks as though I get the same instability in the cholesky.

ColCarroll commented 3 years ago

Actually, this works with float64 NUTS, which is a pretty close analogue to PyMC3. The top plot is from your MRE, the bottom is running NUTS with 1 chain (note it has half as many draws). The fix should be in tfp-nightly in a few days. Still curious about whether there's an HMC fix, but maybe this unblocks you?

image image