Closed kstoreyf closed 3 years ago
my prior involves an ellipsoid in the 7-dimensional space of some of my parameters (the other 11 are uniform; i know 18 total is a lot!), which is pre-computed by calculating the covariance of my training set; then i determine whether the input parameters are some chi^2 away from the mean of the training set, given this covariance. but i don't see how to write this sort of multivariate gaussian in the format of these examples.
Actually, this is totally doable! For an N-D Gaussian, you can generate samples via the following formula: X ~ C^1/2 Z + M Z ~ iid N(0, 1)
where the Z's are a bunch of iid standard normal random variables, M is the mean, and C^1/2 is the square root of the covariance matrix. So, all you have to do is:
I tried to show the steps in the bivariate case, but I guess the example was unclear. I could add a more detailed explanation like this into the documentation if you think it would be helpful.
As for the chi2 -> True/False component, could you elaborate on that a bit? It's true that depending on how that's implemented you can end up with Weird Things Happening, so if you can share more details that'd be super.
thank you for the quick response!
i think i hit an issue with the square root of the covariance matrix - mine has negative values, so the sqrt is not defined. maybe i'm missing something simple here?
for the chi2, it's technically a t-test - what i currently do is take the difference between the input parameters and the mean of the training set, delta_p, and compute: t = delta_p (dot) C^-1 (dot) delta_p. then i check if t is less than some cut-off (i'm using cut=12); if it's not, i return -inf, else i continue and return the lnlike. but i think a multivariate gaussian might be fine if i can get that to work?
The matrix square root can still exist when the matrix has negative entries. For example,
from scipy import linalg as slalg
C = np.array([[5, -1], [-1, 3]])
Csqrt = slalg.sqrtm(C)
print(Csqrt)
>>> [[ 2.22157925 -0.25413708]
[-0.25413708 1.71330509]]
And you can check that doing
samps1 = np.random.multivariate_normal([0, 0], C, size=10000).T
samps2 = np.dot(Csqrt, np.random.normal(0, 1, size=(2, 10000))
should give ~identical distributions.
Note that there are multiple definitions for the matrix square root, so other versions (like the Cholesky decomposition) can also work depending on the context.
oh of course, thank you i'll try this!
Good luck! If you have any other issues (e.g., with the True/False rejections making things inefficient) just let me know.
hi josh, i have gotten these multivariate gaussian priors to work, thanks for your help - they are not exactly the same as the original hard-edged ellipsoidal prior, but very close, and works for my use case.
i'm now back to my original issue, which i was hoping these priors would help with, but alas. i am getting multimodal posteriors, when i am quite sure these extra modes are not real.
here is a demo (sorry this is a bit in the weeds): when i just use 7 free (cosmological) parameters, i get nice unimodal posteriors. when i add in the 11 other (halo occupation) parameters, for 18 total, i get extra modes. if i use fewer observables for my likelihood, i get less accurate and less precise posteriors, but still unimodal. (the figure shows just the 7 cosmological params that are free for all cases; the others are kept fixed in the first case). i have tried combinations of 3 observables, and these are all (mostly) unimodal, so I don't think it's an issue with a particular observable.
i have tried varying the number of live points, and looking at different combinations of observables.
from what ive seen in other issues, perhaps i should play with the sampling or bounding method, but i'm not sure where to start here. any advice appreciated!
What's the typical difference in delta-log-likelihood over the range of values that encompasses the majority of the posterior mass? In other words, at the dlogX values that roughly bracket the posterior mass in the runplot
, what is the typical range in log-likelihoods between each of the three cases? A sign that these modes are fake would be that the range is either much larger (points getting trapped) or much smaller (samples are very correlated) compared to the "good" 7-parameter benchmark. Do you also have a traceplot you could post, just to show what the samples look like over the full run? Another common diagnostic would be to grab the multi-ellipsoid bounding distributions from the results
object and see how many ellipsoids they contain over time (especially closer to the end). By default, for the multi
case this should be with the nells
argument I think. If there's a bunch, that could be evidence that it's just not doing a good job approximating the posterior and giving you poor behavior.
As for other options, you could try imposing single
on the bounding distribution to force it to only use a single ellipsoid since it sounds like that's not a crazy approximation. Increasing the number of walks
to be larger could also help using the rwalk
sampling option. And changing to rslice
with a larger number of slices
could also be worth a try.
Sorry you've been having so much trouble!
thanks for all this! here's a traceplot for the multimodal 18-parameter run:
and the associated runplot:
as well as the number of ellipsoids as a function of iteration (i tried to grab this info from results
but let me know if it looks funny):
compared to the good 7-parameter traceplot:
and corresponding runplot:
n_ellipsoids:
so it looks like the quantity you described, the range of log-likelihood values for the posterior mass, is much smaller for the bad run, meaning the samples are very correlated. given what you said, this seems like support for the modes-are-fake hypothesis? if this is indeed the case, does knowing it help with a fix?
it also looks like the bad run has significantly more ellipsoids, again indicating poor behavior. i'll try imposing the single
ellipsoid bound and see how it goes (makes me a bit nervous to force this but i think it would be fine if these tests show the other modes are fake!). is there a way to limit the number of ellipsoids, but >1?
i'll also try increasing the walks
and using rslice
, thanks!
one other thing i forgot to mention, i played around with the dlogz
convergence criterion a good deal, and it seemed that the bad run had poor convergence compared to the better runs (posteriors fluctuated a lot when i changed dlogz
). but i was getting down to very tiny dlogz
values and it still wasn't converging, so i figured this wasn't the main issue. for all of these i've been using dlogz=0.01
.
While there's no way currently to limit the total number of ellipsoids, you can control the overall aggressiveness/conservativeness of the decomposition algorithm for multi
using the vol_dec
and vol_check
arguments. This will make it less likely that more ellipsoids will be used.
The posteriors fluctuating a lot probably is an indication that the sampling isn't going well, rather than strictly a dlogz
thing. 0.01 should be plenty good for most problems.
Closing this for now. Hopefully some of the improvements in constructing the bounding distributions also will help resolve similar issues down the line.
hi! i'm wondering if it's possible to incorporate a complex prior that can't be written in the ways described here. i have an object that takes in the parameters and returns True if in the prior, else False; can I construct a prior with this in the format of the prior transform?
there might be a way im not seeing to write it without this object, in a way that works with the prior transform format. my prior involves an ellipsoid in the 7-dimensional space of some of my parameters (the other 11 are uniform; i know 18 total is a lot!), which is pre-computed by calculating the covariance of my training set; then i determine whether the input parameters are some chi^2 away from the mean of the training set, given this covariance. but i don't see how to write this sort of multivariate gaussian in the format of these examples. (i tried to follow your bivariate normal example, but i can't figure out how to write this with a general covariance matrix.)
finally, i've tried incorporating the True/False prior in the likelihood function by returning -inf if False (after using a uniform cuboid prior transform that encloses the ellipsoid). but this makes the chains take a prohibitively long time for the large number of parameters I need (it actually seems to get stuck); maybe this is messing with how the nested sampling is working, though i'm not very familiar with the details.
thanks so much!