joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

using complex priors with object that returns True/False? #214

Closed kstoreyf closed 3 years ago

kstoreyf commented 3 years ago

hi! i'm wondering if it's possible to incorporate a complex prior that can't be written in the ways described here. i have an object that takes in the parameters and returns True if in the prior, else False; can I construct a prior with this in the format of the prior transform?

there might be a way im not seeing to write it without this object, in a way that works with the prior transform format. my prior involves an ellipsoid in the 7-dimensional space of some of my parameters (the other 11 are uniform; i know 18 total is a lot!), which is pre-computed by calculating the covariance of my training set; then i determine whether the input parameters are some chi^2 away from the mean of the training set, given this covariance. but i don't see how to write this sort of multivariate gaussian in the format of these examples. (i tried to follow your bivariate normal example, but i can't figure out how to write this with a general covariance matrix.)

finally, i've tried incorporating the True/False prior in the likelihood function by returning -inf if False (after using a uniform cuboid prior transform that encloses the ellipsoid). but this makes the chains take a prohibitively long time for the large number of parameters I need (it actually seems to get stuck); maybe this is messing with how the nested sampling is working, though i'm not very familiar with the details.

thanks so much!

joshspeagle commented 3 years ago

my prior involves an ellipsoid in the 7-dimensional space of some of my parameters (the other 11 are uniform; i know 18 total is a lot!), which is pre-computed by calculating the covariance of my training set; then i determine whether the input parameters are some chi^2 away from the mean of the training set, given this covariance. but i don't see how to write this sort of multivariate gaussian in the format of these examples.

Actually, this is totally doable! For an N-D Gaussian, you can generate samples via the following formula: X ~ C^1/2 Z + M Z ~ iid N(0, 1)

where the Z's are a bunch of iid standard normal random variables, M is the mean, and C^1/2 is the square root of the covariance matrix. So, all you have to do is:

  1. Convert U -> Z using standard normal PPF (as in the bivariate example)
  2. Dot product C^1/2 with Z.
  3. Add the mean M.

I tried to show the steps in the bivariate case, but I guess the example was unclear. I could add a more detailed explanation like this into the documentation if you think it would be helpful.

As for the chi2 -> True/False component, could you elaborate on that a bit? It's true that depending on how that's implemented you can end up with Weird Things Happening, so if you can share more details that'd be super.

kstoreyf commented 3 years ago

thank you for the quick response!

i think i hit an issue with the square root of the covariance matrix - mine has negative values, so the sqrt is not defined. maybe i'm missing something simple here?

for the chi2, it's technically a t-test - what i currently do is take the difference between the input parameters and the mean of the training set, delta_p, and compute: t = delta_p (dot) C^-1 (dot) delta_p. then i check if t is less than some cut-off (i'm using cut=12); if it's not, i return -inf, else i continue and return the lnlike. but i think a multivariate gaussian might be fine if i can get that to work?

joshspeagle commented 3 years ago

The matrix square root can still exist when the matrix has negative entries. For example,

from scipy import linalg as slalg

C = np.array([[5, -1], [-1, 3]])
Csqrt = slalg.sqrtm(C)
print(Csqrt)

>>> [[ 2.22157925 -0.25413708]
     [-0.25413708  1.71330509]]

And you can check that doing

samps1 = np.random.multivariate_normal([0, 0], C, size=10000).T
samps2 = np.dot(Csqrt, np.random.normal(0, 1, size=(2, 10000))

should give ~identical distributions.

Note that there are multiple definitions for the matrix square root, so other versions (like the Cholesky decomposition) can also work depending on the context.

kstoreyf commented 3 years ago

oh of course, thank you i'll try this!

joshspeagle commented 3 years ago

Good luck! If you have any other issues (e.g., with the True/False rejections making things inefficient) just let me know.

kstoreyf commented 3 years ago

hi josh, i have gotten these multivariate gaussian priors to work, thanks for your help - they are not exactly the same as the original hard-edged ellipsoidal prior, but very close, and works for my use case.

i'm now back to my original issue, which i was hoping these priors would help with, but alas. i am getting multimodal posteriors, when i am quite sure these extra modes are not real.

here is a demo (sorry this is a bit in the weeds): when i just use 7 free (cosmological) parameters, i get nice unimodal posteriors. when i add in the 11 other (halo occupation) parameters, for 18 total, i get extra modes. if i use fewer observables for my likelihood, i get less accurate and less precise posteriors, but still unimodal. (the figure shows just the 7 cosmological params that are free for all cases; the others are kept fixed in the first case). i have tried combinations of 3 observables, and these are all (mostly) unimodal, so I don't think it's an issue with a particular observable.

image

i have tried varying the number of live points, and looking at different combinations of observables.

from what ive seen in other issues, perhaps i should play with the sampling or bounding method, but i'm not sure where to start here. any advice appreciated!

joshspeagle commented 3 years ago

What's the typical difference in delta-log-likelihood over the range of values that encompasses the majority of the posterior mass? In other words, at the dlogX values that roughly bracket the posterior mass in the runplot, what is the typical range in log-likelihoods between each of the three cases? A sign that these modes are fake would be that the range is either much larger (points getting trapped) or much smaller (samples are very correlated) compared to the "good" 7-parameter benchmark. Do you also have a traceplot you could post, just to show what the samples look like over the full run? Another common diagnostic would be to grab the multi-ellipsoid bounding distributions from the results object and see how many ellipsoids they contain over time (especially closer to the end). By default, for the multi case this should be with the nells argument I think. If there's a bunch, that could be evidence that it's just not doing a good job approximating the posterior and giving you poor behavior.

As for other options, you could try imposing single on the bounding distribution to force it to only use a single ellipsoid since it sounds like that's not a crazy approximation. Increasing the number of walks to be larger could also help using the rwalk sampling option. And changing to rslice with a larger number of slices could also be worth a try.

Sorry you've been having so much trouble!

kstoreyf commented 3 years ago

thanks for all this! here's a traceplot for the multimodal 18-parameter run:

and the associated runplot:

as well as the number of ellipsoids as a function of iteration (i tried to grab this info from results but let me know if it looks funny):

compared to the good 7-parameter traceplot:

and corresponding runplot:

n_ellipsoids:

so it looks like the quantity you described, the range of log-likelihood values for the posterior mass, is much smaller for the bad run, meaning the samples are very correlated. given what you said, this seems like support for the modes-are-fake hypothesis? if this is indeed the case, does knowing it help with a fix?

it also looks like the bad run has significantly more ellipsoids, again indicating poor behavior. i'll try imposing the single ellipsoid bound and see how it goes (makes me a bit nervous to force this but i think it would be fine if these tests show the other modes are fake!). is there a way to limit the number of ellipsoids, but >1?

i'll also try increasing the walks and using rslice, thanks!

one other thing i forgot to mention, i played around with the dlogz convergence criterion a good deal, and it seemed that the bad run had poor convergence compared to the better runs (posteriors fluctuated a lot when i changed dlogz). but i was getting down to very tiny dlogz values and it still wasn't converging, so i figured this wasn't the main issue. for all of these i've been using dlogz=0.01.

joshspeagle commented 3 years ago

While there's no way currently to limit the total number of ellipsoids, you can control the overall aggressiveness/conservativeness of the decomposition algorithm for multi using the vol_dec and vol_check arguments. This will make it less likely that more ellipsoids will be used.

The posteriors fluctuating a lot probably is an indication that the sampling isn't going well, rather than strictly a dlogz thing. 0.01 should be plenty good for most problems.

joshspeagle commented 3 years ago

Closing this for now. Hopefully some of the improvements in constructing the bounding distributions also will help resolve similar issues down the line.