Discrepancy: paxes vs axes

joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences

https://dynesty.readthedocs.io/

MIT License

347 stars 76 forks source link

Discrepancy: paxes vs axes #193

Closed mileslucas closed 4 years ago

mileslucas commented 4 years ago

Hello, I'm trying to learn more about this code and I keep scratching my head over how axes and paxes are used within the bounding algorithms and the sampling algorithms.

For example, in the ellipsoidal code, the axes are labeled as the "transformation axes" of the ellipsoid and are the Cholesky decomposition of the covariance matrix. Also here, paxes are labeled as the principle axes and then describe how they should be used for transforming between the unit hyper-cube and the prior volume.

Given this, I assume that methods like the random-walk (rwalk) implementation would use paxes to generate proposals, yet the code fed into it uses axes (many snippets of "if method in ['rwalk',...] return ell.axes". Upon inspecting the nestle codebase and some of my own code, the appropriate solution is to use paxes as they are described in the bounding constraint constructors.

Is this is a bug or is this a lapse in my understanding?

joshspeagle commented 4 years ago

No, you’re not wrong — the two types of axes are a result of a tortured development trying to get the bounding ellipsoids to be both stable and straightforward to use internally. The way I’ve set things up, the Cholesky decomposition C^1/2 in axes is the correct way to transform a standard normal random draw z to x ~ C^1/2 * z, which is used in rwalk and unif. However, when sampling using slice, since the code rotates through the principle axes of the distribution I believe it uses paxes.

mileslucas commented 4 years ago

Hmm, for the standard rejection sampling and the MH walk methods from nestle seem to use the axes as derived here: https://github.com/kbarbary/nestle/blob/581a1c3517ac9589d1f2c6b05679a0f38894a291/nestle.py#L259-L262 (eigen decomposition of the precision matrix) which matches your paxes- so I'm confused why your rejection sampling code uses axes.

The actual sampling in nestle uses u_prop = u + scale * du where du = dot(axes, dr) (for the MH walking), so my confusion is how nestle uses the eigen-decomposition and you use the cholesky for the same thing and they both seem to work.

joshspeagle commented 4 years ago

I'd have to go back through the math (dynesty was initially built off of nestle, so I started from the code snippet you shared), but I'm pretty sure that while the actual transformation for a given z is different, the statistical distribution of points after transforming an ensemble of random z draws is the same.

mileslucas commented 4 years ago

That seems reasonable (and I could check with monte-carlo easily enough), though it begs the question: if you can slice and sample using paxes why carry around axes at all?

Thanks for having this commentary, too, I really appreciate it!

joshspeagle commented 4 years ago

If I'm remembering correctly, there was some point during development where doing the Cholesky decomposition was useful for checking stability. Since you get the matrices for free, I think I just started using them as a result. I'm sure there were other reasons that I could back out looking through old commits, but that's all I got off the top of my head.

And no problem! Happy to try and help. If you have any other questions, happy to answer them here. Feel free to close this issue whenever you're satisfied.