handley-lab / anesthetic

Nested Sampling post-processing and plotting
https://anesthetic.readthedocs.io
MIT License
57 stars 16 forks source link

Function to compute iso_probability_contours for given MCMCSamples or NestedSamples object #178

Open Stefan-Heimersheim opened 3 years ago

Stefan-Heimersheim commented 3 years ago

Is your feature request related to a problem? Please describe. It is not trivial how to compute confidence intervals from MCMCSamples or NestedSamples even though we have this functionality in the 1d plots (with facecolor=True).

There is a function iso_probability_contours_from_samples which sounds like it should do that (but does not seem to), the documentation does not explain much and the source code looks mostly like iso_probability_contours.

Describe the solution you'd like A function analogous to plot_1d which returns confidence intervals, e.g. MCMCSamples.confidence_1d(key, type='fastkde', confidence_level=[0.68, 0.95]) could return [[(1.2,1.4), (1.53,1.55)], [(1.1, 1.6)]]. Possible types would be hist, kde, and fastkde. Edit: We need to return a list of tuples in case there are multiple peaks

Describe alternatives you've considered In the past I just used the samples to compute the kde manually and compute this, but a built-in functionality would surely be beneficial for many users and also encourage using a sensible definition of confidence levels.

Stefan-Heimersheim commented 3 years ago

I just copied and adjusted some code from kde_plot_1d to return confidence intervals in #179. Not sure if I understand everything that is going on there though or if this is the optimal way. (Of course ideally there should be a function providing these intervals that is instead called by kde_plot_1d.)

lukashergt commented 3 years ago

Hi Stefan,

for completeness it's probably good to mention that there of course is the option of calculating the standard deviation with sample.x0.std(). That is probably the best thing to use for approximately Gaussian posteriors.

Beyond that we need to specify exactly what we mean by confidence intervals (let's go with 68% confidence intervals in my example plots with a skewed Gaussian), where I see two possible interpretations:

1) In terms of percentiles/quantiles: For this there already is the option of using e.g. sample.x0.quantile(0.16), which can be performed directly on the sample and therefore does not require any KDE computation. This places the boundaries where the posterior mass of the lower and upper tails make up 16% each individually. image

2) In terms of iso-probability: This places the boundaries at the equal posterior value where the posterior mass of the lower and upper tails make up 32% together (but the posterior mass in the lower tail might end up different from the one in the upper tail). image Note how reporting this alongside mean or median might counter intuitively suggest that the lower tail is the longer tail. Should this therefore better be reported alongside the posterior mode? This second option is what you are addressing @Stefan-Heimersheim, right? This will indeed require a KDE estimator and hence be dependent on the bandwidth, which in turn will depend a lot on anesthetic's ncompress value for complicated posterior distributions. As a hacky work-around, what you can do to get iso-probability values is feed anesthetic's plotted data to scipy.interpolate.interp1d.

Stefan-Heimersheim commented 3 years ago

Hi Lukas,

Thanks for the nice explanations! I actually neither new that there is a .quantiles function in anesthetic, nor that one can get interpolation data from plots (line.get_{x,y}data()), this is very useful!

Yes I meant iso-probability intervals. I talked about that with @williamjameshandley recently and realized that quantiles are not very appropriate for the skewed distributions I was working with. Actually I thought that there are three common options of

  1. Quantiles 16% on each side (0.16, 0.68, 0.16)
  2. Iso-probability contours containing 68% probability
  3. Smallest possible interval containing 68% confidence of PDF. (i.e. minimize prior-volume of confidence region)

But looking at your plot now I realize that the 2nd and 3rd option should be identical. I think about it like this: The 68% interval can only be moved by moving both bounds to the left, or both to the right. However, as the PDF has the same height at both points, when moving the lines, say, to the right, the left line will move to a higher and the right one to a lower PDF region, and therefore move less x-distance. Therefore the x-distance between the lines grows.

I'm not 100% sure if am overlooking something, but if not, would this be a way to compute iso-probability confidence levels (for non-multimodal distributions*) without relying on KDEs or histograms? We could simply solve the optimization problem of optimizing A such that B(A)-A is minimal, where B(A) is defined such that the interval [A,B] that contains 68% of samples/weights.

Edit: I'll test this later and see if it works :)

*For multi-model peaks, 2nd==3rd should work similarly based on the KDE but the method I proposed is based on (noisy) samples which makes multi-modal peaks hard to define in general.

Stefan-Heimersheim commented 3 years ago

Here is an example showing the prior-volume minimizing contour, and it looks like the iso-probability contour for this skewed distribution:

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as sst
import scipy.integrate as sin
import scipy.optimize as sop

def b_of_a(a, f, level):
    opt = lambda b: level-sin.quad(f,a,b)[0]
    try:
        b = sop.bisect(opt,-4,4)
        return b
    except ValueError:
        return np.inf 

def confidence_interval(pdf, level=0.68):
    mini = lambda a: b_of_a(a, pdf, level)-a
    a = sop.minimize(mini, -4)['x'][0]
    b = b_of_a(a, pdf, level)
    return [a,b]

f = lambda x: sst.skewnorm.pdf(x, a=5)

cl = confidence_interval(f)

interval = np.linspace(*cl, 100)
plt.fill_between(interval, f(interval))

x = np.linspace(-4,4,1000)
plt.plot(x, f(x))

plt.show()

Figure_1

And here this method, based on samples (without using the PDF):

samples = np.random.uniform(-4,4,size=100000)
weights = f(samples)

def b_of_a_samples(a, samples, weights, level, bmax):
    #print("try a =",a)
    norm = np.sum(weights)
    posterior_volume_target = lambda b: level - np.sum(weights[np.logical_and(samples<b, a<samples)])/norm
    try:
        b = sop.bisect(posterior_volume_target, a, bmax)
        #print("found b =", b)
        return b
    except ValueError:
        return np.inf

def CL(samples, weights, level=0.68, xmin=-4, xmax=4):
    prior_volume = lambda a: b_of_a_samples(a, samples, weights, level, xmax) - a
    xtest = np.linspace(-4,4,1000)
    arg = np.argmin([prior_volume(x) for x in xtest])
    return xtest[arg], b_of_a_samples(xtest[arg], samples, weights, level, xmax) #sop.minimize(, xmin)

a,b = CL(samples, weights)

plt.hist(samples, weights=weights, density=True, bins=1000, range=(-4,4), label='Samples')
plt.axvline(a, color="red", label='68% CL')
plt.axvline(b, color="red")
plt.plot(x, f(x), label='True PDF')
plt.legend()
plt.show()

Figure_2

Both are horribly inefficient of course (I think computing and interpolating a CDF from the samples would be much more efficient); and the PDF looks like a simple KDE or "fit to the histogram" would have worked better. But for this method you wouldn't have to choose any bandwidth or bin size, just count samples, so I might try it out with some real-world data later.

Edit: Much faster and nicer version:

import scipy.interpolate as sip
def fastCL(samples, weights, level=0.68):
    # Sort and normalize
    order = np.argsort(samples)
    samples = samples[order]
    weights = weights[order]/np.sum(weights)
    # Compute inverse cumulative distribution function
    CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
    S = np.array([-np.inf, *samples, np.inf])
    #cdf = sip.interp1d(S, CDF)
    invcdf = sip.interp1d(CDF, S)
    # Find smallest interval
    distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
    a = sop.minimize(distance, (1-level)/2, bounds=[(0,1-level)], method="Nelder-Mead").x[0]
    interval = np.array([invcdf(a), invcdf(a+level)])
    return interval

fastCL(samples, weights)

Edit: Small problem here: invcdf(0) goes to infinity due to interpolation bounds, invcdf(0) should be min(x)

import scipy.interpolate as sip
def fastCL(samples, weights, level=0.68):
    # Sort and normalize
    order = np.argsort(samples)
    samples = samples[order]
    weights = weights[order]/np.sum(weights)
    # Compute inverse cumulative distribution function
    CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
    S = np.array([-np.min(samples), *samples, np.max(samples)])
    #cdf = sip.interp1d(S, CDF)
    invcdf = sip.interp1d(CDF, S)
    # Find smallest interval
    distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
    a = sop.minimize(distance, (1-level)/2, bounds=[(0,1-level)], method="Nelder-Mead").x[0]
    interval = np.array([invcdf(a), invcdf(a+level)])
    return interval

fastCL(samples, weights)
lukashergt commented 3 years ago

Hi Stefan,

the fastCL function is a really neat approach avoiding a KDE and the choice of a bandwidth. Seems to work well for large quantiles (e.g. 0.95), for smaller quantiles (e.g. 0.68) the accuracy appears to vary a bit more.

EDIT: The perceived lower accuracy for smaller quantiles appears to come down to insufficient samples in the high probability region. E.g. when using samples = stats.skewnorm.rvs(a=5, size=100000) (all with equal weights), then the accuracy of the smaller quantiles gets better.


Anyhow, for future lookup and completeness I wanted to add this:

Approach making use of the KDE plotting data

The bottle neck in this is the computation of the high accuracy KDE (i.e. high ncompress in anesthetic). Once you have that, then the rest should compute very fast.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy import optimize
from scipy import interpolate
from scipy import integrate
from anesthetic import MCMCSamples

# using your example data
# -----------------------
f = lambda x: stats.skewnorm.pdf(x, a=5)
x = np.linspace(-4, 4, 1000)
samples = np.random.uniform(-4, 4, size=100000)
weights = f(samples)
mcmc = MCMCSamples(data=samples, weights=weights, columns=['f'])

# generate the KDE data with anesthetic's `plot_1d` function
# ----------------------------------------------------------
# some remarks:
#   * `q=1` ensures that the full data-range is being used
#   * `density=True` normalises the posterior (default would be that maximum is 1)
#   * `ncompress=100000`: set to sth lower to save time, then increase for accuracy
fig, axes = mcmc.plot_1d(['f'], q=1, density=True, ncompress=100000, label='anesthetic')

# interpolate the KDE data to get an effective PDF
# ------------------------------------------------
pdf_kde = interpolate.interp1d(axes.iloc[0].lines[0].get_xdata(), 
                               axes.iloc[0].lines[0].get_ydata(), 
                               kind='cubic', bounds_error=False, fill_value=0)
# Compute iso-probability bounds
# ------------------------------
def root_func_for_lower_credibility_bound(z, pdf, ppf, q=0.68):
    return pdf(ppf(z)) - pdf(ppf(z + q))

def get_iso_proba_bounds(pdf, q):
    """
    Compute the lower and upper iso-probability bounds that enclose the quantile `q`.

    Parameters
    ----------
        pdf: scipy.interpolate.interpolate.interp1d
            Probability density function of the posterior that you can get from the KDE.
        q: float
            Quantile that should be enclosed by the returned iso-probability bounds.

    Returns
    -------
        lower, upper: (float, float)
            Lower and upper iso-probability bounds given a `pdf` and a quantile `q`.
    """
    if q < 0.5:
        q = 1 - q

    # integrate the PDF to get the CDF and interpolate to invert and get the PPF
    sol = integrate.solve_ivp(lambda t, y: pdf(t), 
                              t_span=(pdf.x[0], pdf.x[-1]), 
                              y0=[0], 
                              t_eval=pdf.x)
    cdf = sol.y[0] / sol.y[0][-1]
    ppf = interpolate.interp1d(cdf, sol.t, kind='cubic')

    # run root finder
    o = optimize.root_scalar(f=root_func_for_lower_credibility_bound, 
                             args=(pdf, ppf, q), 
                             bracket=[1e-5, 1-q-1e-5])

    if o.converged:
        return ppf(o.root), ppf(o.root + q)
    else:
        return o

b68 = get_iso_proba_bounds(pdf_kde, 0.68)
b95 = get_iso_proba_bounds(pdf_kde, 0.95)
# Plot results
# ------------
plt.figure()
plt.hist(samples, weights=weights, density=True, bins=1000, range=(-4, 4), 
         color='k', alpha=0.3, label='Samples')
plt.plot(x, f(x),       c='k',  label='true PDF')
plt.plot(x, pdf_kde(x), c='C2', label='anesthetic KDE')

# 68% iso-probability bounds
plt.axvline(b68[0],          color='C1', ls='-', label='68% lower bound')
plt.axvline(b68[1],          color='C4', ls=':', label='68% upper bound')
plt.axhline(f(b68[0]), color='C1', ls='-')
plt.axhline(f(b68[1]), color='C4', ls=':')

# 95% iso-probability bounds
plt.axvline(b95[0],          color='C3', ls='-', label='95% lower bound')
plt.axvline(b95[1],          color='C0', ls=':', label='95% upper bound')
plt.axhline(f(b95[0]), color='C3', ls='-')
plt.axhline(f(b95[1]), color='C0', ls=':')

plt.legend(loc='upper left')
plt.show()

iso-probability bounds

Stefan-Heimersheim commented 3 years ago

Thanks for this @lukashergt, good to note these, I wasn't familiar with q and density here!

#   * `q=1` ensures that the full x-range is being used
#   * `density=True` normalises the posterior (default would be that maximum is 1)
#   * `ncompress=100000`: set to sth lower to save time, then increase for accuracy

Edit: Based on this I wrote a bestfit function which might be of interest:

def bestFit(data, key, ncompress=10000, plot=False):
    fig, axes = data.plot_1d([key], q=1, density=True, label='anesthetic', ncompress=ncompress)
    pdf_kde = sip.interp1d(axes.iloc[0].lines[0].get_xdata(), 
                               axes.iloc[0].lines[0].get_ydata(), 
                               kind='cubic', bounds_error=False, fill_value=0)
    if plot:
        axes[key].hist(data[key], weights=data.weights, density=True, label='anesthetic', range=(0,0.1), bins=30)
        plt.show()
    else:
        fig.clear()
        plt.close(fig)
    return sop.minimize(lambda x: -pdf_kde(x), fid[key], method="Nelder-Mead").x[0]

When I have time I'll make a PR for a confidence_level function, and possibly also the bestfit function, I think they could be useful.

Edit: Updated version to deal with duplicated points in line data (seen with Planck chains):

def bestFit(data, key, ncompress=10000, plot=False, verbose=True):
    fig, axes = data.plot_1d([key], q=1, density=True, label='anesthetic', ncompress=ncompress)
    x = axes.iloc[0].lines[0].get_xdata()
    y = axes.iloc[0].lines[0].get_ydata()
    mask = [*(np.diff(x)==0), False]
    if np.sum(mask) > 0:
        if verbose:
            print("bestFit: Removing", np.sum(mask), "duplicates from kde interpolation function")
        indices = np.where([*(np.diff(x)==0), False])[0]
        assert np.all(y[indices] == y[indices+1])
        x = x[np.logical_not(mask)]
        y = y[np.logical_not(mask)]
    pdf_kde = sip.interp1d(x,y,kind='cubic', bounds_error=False, fill_value=0)
    if plot:
        axes[key].hist(data[key], weights=data.weights, density=True, label='anesthetic', range=data.limits[key], bins=50)
    else:
        fig.clear()
        plt.close(fig)
    return sop.minimize(lambda x: -pdf_kde(x), fid[key], method="Nelder-Mead").x[0]
Stefan-Heimersheim commented 3 years ago

I will make a new PR replacing the complicated implementation in #179 by the fastCL one above (renamed to {MCMC,Nested}Samples.credibility_interval()).

Do we also want my hacky bestFit code from above in anesthetic? I don't think it's good enough to be a built-in function, we should probably wait until we have a better way than internally making a plot.

Edit: Changelog Added import scipy.optimize as sop Changed method to "SLSQP" as that one actually respects the bounds. Removed stray - in front of np.min(samples)

import scipy.optimize as sop
import scipy.interpolate as sip
def fastCL(samples, weights, level=0.68):
    # Sort and normalize
    order = np.argsort(samples)
    samples = samples[order]
    weights = weights[order]/np.sum(weights)
    # Compute inverse cumulative distribution function
    CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
    S = np.array([np.min(samples), *samples, np.max(samples)])
    #cdf = sip.interp1d(S, CDF)
    invcdf = sip.interp1d(CDF, S)
    # Find smallest interval
    distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
    a = sop.minimize(distance, (1-level)/2, bounds=[(0,1-level)], method="SLSQP").x[0]
    interval = np.array([invcdf(a), invcdf(a+level)])
    return interval

fastCL(samples, weights)

Edit: Slightly neater version, planning to put this into a PR

import scipy.optimize as sop
import scipy.interpolate as sip

def fastCL(samples, weights=None, level=0.68):
    assert level<1, "Level >= 1!"
    weights = np.ones(len(samples)) if weights is None else weights
    # Sort and normalize
    order = np.argsort(samples)
    samples = samples[order]
    weights = weights[order]/np.sum(weights)
    # Compute inverse cumulative distribution function
    CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
    S = np.array([np.min(samples), *samples, np.max(samples)])
    invcdf = sip.interp1d(CDF, S)
    if method=="iso-probability":
        # Find smallest interval
        distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
        a = sop.minimize_scalar(distance, bounds=(0,1-level), method="Bounded")
        interval = np.array([invcdf(a.x), invcdf(a.x+level)])
    elif method=="lower-limit":
        # Get value from which we reach the desired level
        interval = invcdf(1-level)
    elif method=="upper-limit":
        # Get value to which we reach the desired level
        interval = invcdf(level)
    else:
        assert False, method
    return interval

fastCL(samples, weights)
williamjameshandley commented 2 years ago
  1. Iso-probability contours containing 68% probability
  2. Smallest possible interval containing 68% confidence of PDF. (i.e. minimize prior-volume of confidence region)

But looking at your plot now I realize that the 2nd and 3rd option should be identical. I think about it like this: The 68% interval can only be moved by moving both bounds to the left, or both to the right. However, as the PDF has the same height at both points, when moving the lines, say, to the right, the left line will move to a higher and the right one to a lower PDF region, and therefore move less x-distance. Therefore the x-distance between the lines grows.

This is an incredibly neat result. If you want the d-dimensional differential geometric version I have (finally) got round to writing it up: isoprob-crop

isoprob-crop.pdf

This makes me wonder whether we could use this for the 2d plots, and/or whether this would allow us to do away with (or equivalently construct an alternative approach for) kernel density estimation (in either the 2d or the 1d case).

Something to chew on. I'll get on and review this now.