Stability/consistency issues with Tensorflow sts predictions

Hi, I have noticed that the predictions with the sts package are sometimes inconsistent over multiple runs, especially when the model is fit with a variational inference surrogate. Running the same code twice can translate into a high accuracy difference. I know that VI find a "local" optimum of the ELBO, and I suppose that running the code twice should obviously lead to slightly different results. However, the observed differences are disturbing when you think of putting your model in production based on these figures. I was wondering if this is a known issue or if it has to do with my model's specification. Also, for the same model, I run sometimes into the following error: InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky] running the same code a second time solves apparently the issue but I still don't understand why it works sometimes and sometimes not. Furthermore, may I ask how and when is the Cholesky decomposition used with a VI pipeline?

Platform:

Tensorflow 2.4
Tensorflow Probability 0.12.1
Python 3.6

Thank you for your help,

Hi Sarah,

These are great questions, which don't necessarily have great answers, but there are some things you can do. Of course the ELBO optimization is subject to local optima in general, though the details of the loss surface (and thus the extent to which this is a problem) are model- and data-specific. A few tricks that can help:

Passing a batch_shape to tfp.sts.build_factored_surrogate_posterior allows you to run multiple optimizations in parallel, from different initializations. You can then pick the surrogate posterior with the lowest loss (highest ELBO) to generate posterior samples for forecasting, etc.
Passing seeds to tfp.sts.build_factored_surrogate_posterior and tfp.vi.fit_surrogate_posterior should yield reproducible behavior, subject to some caveats about TF's underlying PRNGs.
When multiple explanations are consistent with the data---e.g., variance in the data can be explained equally well as observation noise or random drift or seasonal effects---then setting informative priors will force the optimization towards a particular explanation.
The default factored variational approximation is quite restrictive and can lead to spurious local optima; more flexible surrogates (e.g., a full-covariance MVN) would give the optimizer more room to maneuver. Currently you'd have to create such surrogates manually, which is a pain; the next TFP release (and upcoming tfp-nightly releases) will likely include a method tfp.experimental.vi.build_affine_surrogate_posterior to help automate the process.

Cholesky decomposition errors can crop up in STS models if the optimizer considers parameter values that make one of the transition or observation covariance matrices singular or near-singular. There's no totally foolproof way to prevent this, but a few approaches that can help are:

Cast your observed_time_series and any other parameters passed into the model to tf.float64, so that the optimization runs in double precision instead of the default single precision.
If your model contains a Seasonal component, setting constrain_mean_effect_to_zero=False can improve the conditioning.
Set priors that keep scale parameters away from zero, e.g., observation_noise_scale_prior=tfd.LogNormal(loc=0., scale=1.) (noting that the choice of prior will, naturally, also affect the results).

Best, Dave

tensorflow / probability

Stability/consistency issues with Tensorflow sts predictions #1264