Open sarahboufelja opened 3 years ago
Hi Sarah,
These are great questions, which don't necessarily have great answers, but there are some things you can do. Of course the ELBO optimization is subject to local optima in general, though the details of the loss surface (and thus the extent to which this is a problem) are model- and data-specific. A few tricks that can help:
batch_shape
to tfp.sts.build_factored_surrogate_posterior
allows you to run multiple optimizations in parallel, from different initializations. You can then pick the surrogate posterior with the lowest loss (highest ELBO) to generate posterior samples for forecasting, etc.seed
s to tfp.sts.build_factored_surrogate_posterior
and tfp.vi.fit_surrogate_posterior
should yield reproducible behavior, subject to some caveats about TF's underlying PRNGs.tfp-nightly
releases) will likely include a method tfp.experimental.vi.build_affine_surrogate_posterior
to help automate the process.Cholesky decomposition errors can crop up in STS models if the optimizer considers parameter values that make one of the transition or observation covariance matrices singular or near-singular. There's no totally foolproof way to prevent this, but a few approaches that can help are:
observed_time_series
and any other parameters passed into the model to tf.float64
, so that the optimization runs in double precision instead of the default single precision.Seasonal
component, setting constrain_mean_effect_to_zero=False
can improve the conditioning.observation_noise_scale_prior=tfd.LogNormal(loc=0., scale=1.)
(noting that the choice of prior will, naturally, also affect the results).Best, Dave
Hi, I have noticed that the predictions with the sts package are sometimes inconsistent over multiple runs, especially when the model is fit with a variational inference surrogate. Running the same code twice can translate into a high accuracy difference. I know that VI find a "local" optimum of the ELBO, and I suppose that running the code twice should obviously lead to slightly different results. However, the observed differences are disturbing when you think of putting your model in production based on these figures. I was wondering if this is a known issue or if it has to do with my model's specification. Also, for the same model, I run sometimes into the following error:
InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
running the same code a second time solves apparently the issue but I still don't understand why it works sometimes and sometimes not. Furthermore, may I ask how and when is the Cholesky decomposition used with a VI pipeline?Platform:
Thank you for your help,