nanograv / enterprise

ENTERPRISE (Enhanced Numerical Toolbox Enabling a Robust PulsaR Inference SuitE) is a pulsar timing analysis code, aimed at noise analysis, gravitational-wave searches, and timing model analysis.
https://enterprise.readthedocs.io
MIT License
64 stars 65 forks source link

inf/nan in deterministic signal waveform causes crash #321

Closed paulthebaker closed 2 years ago

paulthebaker commented 2 years ago

the problem

An inf or nan in the output of a deterministic signal causes a ValueError in PTA.get_lnlikelihood and crashes. It should probably cause PTA.get_lnlikelihood to return -np.inf and continue.

The bad value in the waveform propagates into TNr. Then the cholesky solve raises the ValueError during a check finiteness step. We currently catch LinalgErrors that occur during the cholesky factorization when Sigma = TNT + phiinv is bad. In those cases we return -np.inf and continue. This ValueError is not caught.

when does it happen?

Waveforms can have bad values when parameters like cos_theta or cos_inc are out of bounds. Then a calculation like arccos(cos_theta) returns a nan.

AM and SCAM proposals happily move out of bounds. If the chain is near the edge, e.g. cos_theta~1, it's pretty easy to end up on the other side. Additionally, these moves will occasionally propose "big" steps that are scaled by 10*sigma of the local covariance matrix. Which can also go out.

why had no one noticed until now?

In PTMCMCSampler the prior is always evaluated before the likelihood. If the prior returns -np.inf, then the likelihood evaluation is skipped. In almost all cases we set cos_theta = Uniform(-1, 1), so the prior catches out of bounds values. If we use a Normal prior, the prior won't forbid the out of bounds values.

some solutions

  1. Add ValueError to the try/except wrapper for the cholesky call.
  2. Raise a new error in the waveform code, maybe WaveformError, and catch that when we call get_TNr
  3. Run our own finiteness check on TNr.

ValueError is a pretty broad error, so (1) may catch things that really should crash. Options (2) and (3) have the advantage that they don't waste cycles on the cholesky factorization, just for the solver to fail. I think (2) is the most efficient, but my vote is for (3). It seems simpler to set up.

vallis commented 2 years ago

Thanks, interesting... but I think it's a good rule in numerical computing that nans should never be neutralized.

Otherwise (in this context) we may have situations where a bug is introduced in a waveform and it's never caught because sampling proceeds fine. Even the LinalgError -> -np.inf equivalence is a bit problematic, but what we're saying is that a covariance matrix that can't be inverted is probably in a region of parameter space that's not good for us.

Here I think that the waveform generator itself should throw an exception for out of bounds parameters (so we don't have to track the problem back from the linear algebra), and the user should use priors that do not allow them—in this case, a truncated Normal, which we can implement.

paulthebaker commented 2 years ago

There's a truncnormal RV object in scipy.stats. It would be pretty easy to implement a truncated normal parameter using that (I've already put together a minimal working example). Although that implementation would only work for 1D parameters. There's not a truncated multivariate normal.

@vallis, do we think this belongs in enterprise base? Would a truncated normal parameter be a better suited to enterprise_extensions, that's were the t-process parameter lives? Do we need a multivariate option?

vallis commented 2 years ago

A truncnormal parameter would work in either enterprise or e_e. A few PRs ago, I did remove all scipy.stats from parameter.py because it was so slow... the prior was slower than the likelihood in some cases. It should not be too much work to implement truncnormal with basic numpy functions, perhaps with some caching for the normalization factor. As long as it works for scalars and perhaps numpy vectors, it will be fine. Truncated multivariate seems exotic!