joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

dynesty missed second mode (peak) #179

Closed minaskar closed 4 years ago

minaskar commented 4 years ago

I'm trying to sample from a 15-dimensional pdf. One of the 2D marginal posterior distributions is bimodal with one peak much smaller than the other. I know this is true because I first sampled this distribution using MCMC.

However, dynesty only finds one of the modes (peaks), the larger one.

Is there something I can do to fix this?

joshspeagle commented 4 years ago

I need a bit more information on what's the issue here. What's the overall distribution in question? What's your MCMC and dynesty settings when sampling this?

minaskar commented 4 years ago

The issue: Simple put, the issue is that given a (particular) bimodal distribution, dynesty produces samples that reside only in one of the two peaks (in the larger one).

Details about the distribution: The pdf is 15-dimensional and unfortunately it is part of a complicated pipeline, so it's not easy for me to present a reproducible toy example.

However, using two distinct MCMC algorithms (zeus & emcee) I'm able to sample correctly from that bimodal distribution.

Details about the dynesty configuration: I tried many different dynesty configurations, namely using dynamic nested sampling:

Nothing seems to work though.

joshspeagle commented 4 years ago

That helps. Have you tried just using NestedSampler rather than DynamicNestedSampler and bumping the live points up to 5000 or something? The resolution is directly proportional to the number of live points, so that'd be a good first check. You could probably stick with rwalk and the other default settings.

Also, a good sanity check is making sure your prior transforms all span the parameter space in question. I've had a few instances where problems turned out to be just typos (essentially) where the prior transforms weren't specified quite right.

If you could post a traceplot of the results using the built-in plotting utilities and indicate where the secondary mode is supposed to be there, that'd also be helpful.

minaskar commented 4 years ago

I tried using NestedSampler with 5000 live points with no success. I also verified that the prior transformation is correct, that was the first thing that I checked.

I also noticed that for this particular likelihood, rwalk and slice give quantitatively different results, though none of them agrees with emcee & zeus.

Another thing is that I'm currently "resampling" the samples & weights to get something that I can plot with getdist instead of dyplot. Any chance that the resampling is affecting the marginal posteriors?

I'd be happy to send you some plots via email if you want since this concerns a currently unpublished paper.

Thank you

joshspeagle commented 4 years ago

Sure, send me an email and we can continue this discussion there. The trace plots will be the most useful things to look at here (and in general with every problem), although if you have the full results dictionary saved those contain a whole trove of additional information to dig into.

minaskar commented 4 years ago

I need to run the pipeline again to produce the results dictionary, and that would probably take a few hours but I'm happy to do this if you. think this would be helpful. Should I run this with the default configuration (rwalk, 500 live points, etc.)?

joshspeagle commented 4 years ago

if you think this would be helpful

Yes, in general being able to make trace plots and dig into the actual samples is helpful for investigating problems. If you could do the default configuration that would be a great place to start. Thanks.

minaskar commented 4 years ago

Hi Josh,

I managed to reproduce the problem in a toy example (attached bellow). I created a bimodal distribution in which one of modes is significantly smaller (~10%) than the other, but still significant enough that any MCMC algorithm can pick it up (I tested this using zeus and emcee). The problem is that dynesty identifies only the larger mode/peak when the number of dimensions is larger than 10.

import numpy as np
import dynesty

ndim = 15

def gaussian_pdf(x, mean=0.0, var=1.0):
    n = len(x)
    return np.exp(-0.5 * np.sum((x-mean) ** 2 / var)) / np.sqrt((2.0*np.pi) ** n * var * n)

def loglike(x):
    return np.log(0.2*gaussian_pdf(x, mean=0.0, var=0.2) + 1.0*gaussian_pdf(x, mean=6.0, var=1.0))

def ptform(u):
    return 10 * (2. * u - 1.)

dsampler = dynesty.DynamicNestedSampler(loglike, ptform, ndim)
dsampler.run_nested()
dresults = dsampler.results

from dynesty import plotting as dyplot

# Plot a summary of the run.
rfig, raxes = dyplot.runplot(dresults)

# Plot traces and 1-D marginalized posteriors.
tfig, taxes = dyplot.traceplot(dresults)

# Plot the 2-D marginalized posteriors.
cfig, caxes = dyplot.cornerplot(dresults)
joshspeagle commented 4 years ago

Great example! This illustrates really well one of the central problems/drawbacks of Nested Sampling in terms of how most approaches actually locate modes and how posteriors behave in high dimensions. As a sidenote, I found the math in this example behaved poorly in the tails. This rewrite below

def loggaussian_pdf(x, mean=0.0, var=1.0):
    n = len(x)
    return -0.5 * np.sum((x-mean) ** 2 / var) - np.log(np.sqrt((2.0*np.pi) ** n * var * n))

def loglike(x):
    return np.logaddexp(loggaussian_pdf(x, mean=0.0, var=0.2) + np.log(0.2), 
                        loggaussian_pdf(x, mean=6.0, var=1.0))

is much more stable.

In dynesty, sampling proceeds in two phases. In phase one, samples are drawn uniformly from the prior until the overall efficiency drops below 10%. In this problem, that occurs after ~2000 samples. In phase two, bounding distributions are constructured (in this case via ellipsoids) and samples are subsequently drawn conditioned on those. This initial bounding update it extremely important, because it sets the approximate locations where all the samples will subsequently be drawn. If a secondary mode is both far away and well-localized, as in this toy problem, it has a probability of being missed entirely because of how sampling is performed. As you've discovered, these modes be found (although their relative amplitudes can be harder to characterize) from MCMC runs initialized randomly from the prior due to the different sampling strategies.

As for how likely you might be able to find these modes, we can do some rough calculations. Let's assume the "volume" of each mode goes as ~var^D, so the ratio of the volumes here is 5^15 ~ 3e10. The probability that we don't have a single sample in the secondary mode volume then goes as (1-1/3e10)^2000 ~ 1 out to several decimal places. So it's not surprising we miss it completely here. (The volume technically increases as the relative "sphere of influence" of each Gaussian changes when we adjust their amplitudes, but that's not critical here since the volume term dominates.)

As expected, I find that dynesty indeed misses the secondary peak quite handily given your default configuration. To explore whether I'd be able to find it in general, I increase the amplitude to be equal, then 10x, then 100x, and even 1000x that of the original mode. In all these cases I still miss the mode.

I do find it, however, if I just make it larger by comparison. Since the probability of finding it roughly goes as P ~ 1 - (1-ratio^-D)^2000, which I find for D=15 gives P > 0.5 for ratio < 1.7. The heavy scaling of volume with dimensionality actually makes this a really steep function of ratio, falling from 1 at ratio ~ 1.5 to close to zero at ratio ~ 2. Indeed, I confirm I can find the 20% amplitude mode for var=1.0 and that I stop consistently finding the secondary mode as I decrease var, which shows its the size (rather than scale) that matters here. Here's an example for the secondary mode with var = 0.5, which is just marginally identified (note that I switched from DynamicNestedSampler to NestedSampler here):

image

So I think you've identified some of the challenges associated with Nested Sampling in general (and dynesty in particular) in terms of being able to locate and/or characterize these types of modes.

minaskar commented 4 years ago

Thanks, that was very helpful.

Is there any way that I can configure dynesty so that it is more sensitive and doesn't miss such small modes?

joshspeagle commented 4 years ago

This type of multi-modal distribution (widely separated modes whose relative sizes span orders of magnitude in relative parameter space volume) is the bane of all high-dimensional inference, especially ones that rely on random sampling from priors (rather than trying to do some type of gradient ascent). You could make the first_update argument more conservative or add more live points, but this will in general be a Very Hard Problem for dynesty to handle "out of the box". Sorry...