joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

Robustness of posterior estimation #422

Closed MikhailBeznogov closed 1 year ago

MikhailBeznogov commented 1 year ago

Hello,

This is not exactly an issue with dynesty, but rather a question of how to properly use it to obtain reliable and robust results and how to verify the correctness of the results. First, a small background description.

I am using dynesty for Bayesian inference of some parameters of dense matter equation of state. For my paper (https://arxiv.org/abs/2212.07168) I employed affine invariant Markov Chain Monte Carlo ensemble sampler emcee and also dynesty to independently check the results. Dynesty worked with default setting without any problems; posterior distributions obtained with emcee and dynesty were very close.

However, when I attempted to do the same type of inference in another theoretical framework, I faced serious issues with convergence of MCMC-based methods (including tempered chains and ensemble slice sampling). The autocorrelation times were too long and the acceptance fraction was too small. Thus, I switched to dynesty. However, this time I am having severe problems with dynesty as well.

The first four figures demonstrate the posterior distributions of nine model parameters for 6 runs with different dynesty settings and log-likelihoods for them (the colors are the same). The posteriors and the likelihoods were plotted from weighted samples taking weights into account.

The crucial part of calling of dynesty was (ndim=9,npoints=1000):

sampler = dynesty.DynamicNestedSampler(Chi2,Prior,ndim,bound='none',sample='rslice',
pool = MP_pool,queue_size=800)
sampler.run_nested(n_effective=10000,dlogz_init=0.05,nlive_init=npoints,nlive_batch=500,
wt_kwargs={'pfrac': 0.9})

for blue and green curves (i.e., they have absolutely identical settings!),

sampler = dynesty.DynamicNestedSampler(Chi2,Prior,ndim,bound='single',sample='rwalk',
walks=500, first_update={'min_ncall':int(3.5e5),'min_eff':2.}, update_interval=100.5, 
bootstrap=50,enlarge=1.0, pool = MP_pool,queue_size=800)

for black curves,

sampler = dynesty.DynamicNestedSampler(Chi2,Prior,ndim,bound='single',sample='rwalk',
walks=1000,first_update={'min_ncall':int(5e5),'min_eff':2.}, update_interval=200.5, 
bootstrap=50,enlarge=1.0, pool = MP_pool,queue_size=800)

for cyan curves,

sampler = dynesty.DynamicNestedSampler(Chi2,Prior,ndim,bound='single',sample='rwalk',
walks=100,first_update={'min_ncall':int(5e5),'min_eff':2.}, update_interval=100.5, 
bootstrap=100,enlarge=1.0, pool = MP_pool,queue_size=800)

for purple curve and

sampler = dynesty.DynamicNestedSampler(Chi2,Prior,ndim,bound='single',sample='rwalk',
walks=100,facc=0.2,first_update={'min_ncall':int(5e5),'min_eff':2.}, update_interval=100.5,
bootstrap=100,enlarge=1.0, pool = MP_pool,queue_size=800)

for magenta curve.

Figures: Dynesty_Posteriors_1 Dynesty_Posteriors_2 Dynesty_LogLike_1 Dynesty_LogLike_2

As one can see from the figures, the posterior distributions are noticeably different even for the runs with identical settings. Of course, I do not expect them to be identical, since I have not fixed the random number generator seed, but I expect them to show very close statistics. It is obviously not the case. I have tried to delay boundary construction to avoid "shredding" the posterior (as mentioned in the documentation). However, even with bound='none' setting the marginalized posterior distributions of some of the parameters look "shredded".

I am reasonably sure that the log-likelihood function is not pathological at least within log L > -100 and some problems appear only when log L < -1000. I am also reasonably sure that the posterior is mono-modal. If it is important, the parameters of the model might be correlated in a complicated way and they span very different ranges, the prior is uniform within the corresponding ranges and "Prior" maps the unit cube onto these ranges.

The run plots and trace plots also demonstrate that something is very wrong (the colors on the run plots correspond to the figures above; first trace plot is for the "blue" model, second is for the "black" one).

Run plots: Dynesty_runplot_1 Dynesty_runplot_2

Trace plots: 1. Dynesty_traceplot_1

2. Dynesty_traceplot_2

Any recommendation on how I could obtain robust estimations of the posterior distributions? I can provide additional details if needed. Uniform sampling would be the best, but, unfortunately, it quickly becomes very inefficient and calculation takes too long. Thus, I am limited to other sampling methods like rwalk or rslice.

segasai commented 1 year ago

Hi,

Thanks for providing enough details!

I don't have much time right now, so I can't investigate in details, but a few questions/comments.

MikhailBeznogov commented 1 year ago

Hello,

Thank you for a fast reply. Here are some answers to your questions.

Yes, I am using the latest version, 2.0.3.

It might be that the posterior has some strange shape, some parameters likely have strong nonlinear correlations between them. As I mentioned in the "background" description, I have also used emcee. I have initialized 10,000 walkers uniformly across the whole parameter range and let it go for 6,000 steps. The posteriors based on the last 1,000 steps are plotted below in red. Since I started with uniformly distributed walkers, I would expect that different walkers become stuck at different modes of a multi-modal distribution and the signs of that will be seen in the posterior. As one can see, there are no signs of any multi-modality. Because the chain has not converged, this conclusion is not "fool-proof", but it is a clear indication that the posterior is mono-modal.

The green curves are computed with dynesty with the following settings (nlive_init=2000, run_nested is called as before):

sampler = dynesty.DynamicNestedSampler(Chi2,Prior,ndim,bound='single',sample='rwalk',
walks=1000, facc=0.05, first_update={'min_ncall':int(5e5),'min_eff':2.}, 
update_interval=100.5,bootstrap=150,enlarge=1.0, pool = MP_pool,queue_size=800)

Dynesty_Posteriors_3 Dynesty_LogLike_3

I would also like to provide run and trace plots for the purple curves of my first post, perhaps it will help. Dynesty_runplot_3

Trace plot for the green curves: Dynesty_traceplot_3

And trace plot for the purple curves Dynesty_traceplot_4

Regarding the "noise". The log-likelihood is a simple analytic function. However, for some sets of the parameters the underlying model might fail or produce unphysical results. In these cases, I artificially set log L to -1e100 and add a tilt towards MAP value by multiplying -1e100 with an euclidean distance to the MAP value. I am reasonably sure that this happens at the very least when log L < - 100. Thus, I do not except such a modification to affect the posterior distributions. From the log L plot above one can see that values less than approximately -20 should not matter for the posterior.

As for parameter x1. Values of around -2000 are OK and correspond to good log L. Values of -2500 are also OK, but they correspond to much worse log L.

segasai commented 1 year ago

Thanks.

I'm still having an (unfounded) suspicion, that something is not quite right with the function. Do you think you could share the samples and logl values, I.e. results.samples, results.logl ?
Ideally static runs are better, just because there are less moving parts there.

And alternatively, can you share (maybe privately) the likelihood function (since you say it's analytical) ?

regarding comparison to emcee -- my general view is that samples from MCMC are useful only if your MCMC converged, otherwise it is really impossible to interpret, so i'm not too reassured by the emcee test.

MikhailBeznogov commented 1 year ago

Hello,

Sorry, I was not explicit enough when describing the likelihood function. It is indeed analytical as a function of the output model parameters. However, complicated calculations are involved in computing the output parameters from the input parameters. These calculations are carried out by our C code that is compiled into a library and is called from Python. The C code is "research" grade, it is complicated and have almost no comments, so it is difficult to follow. I will send you privately the compiled library, Python wrappers for it and "pickled" results of a static run.

You are right that the likelihood function is somewhat "pathological" as can be seen from the figure below. It shows nine cross-sections of log L parallel to the axes through the MAP value point (log L = -0.09, blue curves) and somewhat worse log L value point (log L = -0.4, red curves). LogLike_slices

This "pathological" behavior is somewhat artificial. When the C code is unable to compute output parameters from the input ones or the results are unphysical (say, negative pressure for some densities), the code returns a warning flag and I set log L to -1e100 multiplied by some tilt towards the MAP value as I described in the previous comment. It is my understanding that such values are just not accepted as new live points during the sampling. This procedure worked fine both for emcee and dynesty in the case of a different model mentioned in my opening post (although I admit that that model was, overall, much "smoother" than the current one that I have problems with).

segasai commented 1 year ago

The underlying cause was the abnormal likelihood function rather then a dynesty issue (confirmed through priv. comm.)