Open Stefan-Heimersheim opened 3 years ago
I just copied and adjusted some code from kde_plot_1d
to return confidence intervals in #179. Not sure if I understand everything that is going on there though or if this is the optimal way. (Of course ideally there should be a function providing these intervals that is instead called by kde_plot_1d
.)
Hi Stefan,
for completeness it's probably good to mention that there of course is the option of calculating the standard deviation with sample.x0.std()
. That is probably the best thing to use for approximately Gaussian posteriors.
Beyond that we need to specify exactly what we mean by confidence intervals (let's go with 68% confidence intervals in my example plots with a skewed Gaussian), where I see two possible interpretations:
1) In terms of percentiles/quantiles:
For this there already is the option of using e.g. sample.x0.quantile(0.16)
, which can be performed directly on the sample and therefore does not require any KDE computation. This places the boundaries where the posterior mass of the lower and upper tails make up 16% each individually.
2) In terms of iso-probability:
This places the boundaries at the equal posterior value where the posterior mass of the lower and upper tails make up 32% together (but the posterior mass in the lower tail might end up different from the one in the upper tail).
Note how reporting this alongside mean or median might counter intuitively suggest that the lower tail is the longer tail. Should this therefore better be reported alongside the posterior mode?
This second option is what you are addressing @Stefan-Heimersheim, right? This will indeed require a KDE estimator and hence be dependent on the bandwidth, which in turn will depend a lot on anesthetic's ncompress
value for complicated posterior distributions.
As a hacky work-around, what you can do to get iso-probability values is feed anesthetic's plotted data to scipy.interpolate.interp1d
.
Hi Lukas,
Thanks for the nice explanations! I actually neither new that there is a .quantiles
function in anesthetic, nor that one can get interpolation data from plots (line.get_{x,y}data()
), this is very useful!
Yes I meant iso-probability intervals. I talked about that with @williamjameshandley recently and realized that quantiles are not very appropriate for the skewed distributions I was working with. Actually I thought that there are three common options of
But looking at your plot now I realize that the 2nd and 3rd option should be identical. I think about it like this: The 68% interval can only be moved by moving both bounds to the left, or both to the right. However, as the PDF has the same height at both points, when moving the lines, say, to the right, the left line will move to a higher and the right one to a lower PDF region, and therefore move less x-distance. Therefore the x-distance between the lines grows.
I'm not 100% sure if am overlooking something, but if not, would this be a way to compute iso-probability confidence levels (for non-multimodal distributions*) without relying on KDEs or histograms? We could simply solve the optimization problem of optimizing A such that B(A)-A is minimal, where B(A) is defined such that the interval [A,B] that contains 68% of samples/weights.
Edit: I'll test this later and see if it works :)
*For multi-model peaks, 2nd==3rd should work similarly based on the KDE but the method I proposed is based on (noisy) samples which makes multi-modal peaks hard to define in general.
Here is an example showing the prior-volume minimizing contour, and it looks like the iso-probability contour for this skewed distribution:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as sst
import scipy.integrate as sin
import scipy.optimize as sop
def b_of_a(a, f, level):
opt = lambda b: level-sin.quad(f,a,b)[0]
try:
b = sop.bisect(opt,-4,4)
return b
except ValueError:
return np.inf
def confidence_interval(pdf, level=0.68):
mini = lambda a: b_of_a(a, pdf, level)-a
a = sop.minimize(mini, -4)['x'][0]
b = b_of_a(a, pdf, level)
return [a,b]
f = lambda x: sst.skewnorm.pdf(x, a=5)
cl = confidence_interval(f)
interval = np.linspace(*cl, 100)
plt.fill_between(interval, f(interval))
x = np.linspace(-4,4,1000)
plt.plot(x, f(x))
plt.show()
And here this method, based on samples (without using the PDF):
samples = np.random.uniform(-4,4,size=100000)
weights = f(samples)
def b_of_a_samples(a, samples, weights, level, bmax):
#print("try a =",a)
norm = np.sum(weights)
posterior_volume_target = lambda b: level - np.sum(weights[np.logical_and(samples<b, a<samples)])/norm
try:
b = sop.bisect(posterior_volume_target, a, bmax)
#print("found b =", b)
return b
except ValueError:
return np.inf
def CL(samples, weights, level=0.68, xmin=-4, xmax=4):
prior_volume = lambda a: b_of_a_samples(a, samples, weights, level, xmax) - a
xtest = np.linspace(-4,4,1000)
arg = np.argmin([prior_volume(x) for x in xtest])
return xtest[arg], b_of_a_samples(xtest[arg], samples, weights, level, xmax) #sop.minimize(, xmin)
a,b = CL(samples, weights)
plt.hist(samples, weights=weights, density=True, bins=1000, range=(-4,4), label='Samples')
plt.axvline(a, color="red", label='68% CL')
plt.axvline(b, color="red")
plt.plot(x, f(x), label='True PDF')
plt.legend()
plt.show()
Both are horribly inefficient of course (I think computing and interpolating a CDF from the samples would be much more efficient); and the PDF looks like a simple KDE or "fit to the histogram" would have worked better. But for this method you wouldn't have to choose any bandwidth or bin size, just count samples, so I might try it out with some real-world data later.
Edit: Much faster and nicer version:
import scipy.interpolate as sip
def fastCL(samples, weights, level=0.68):
# Sort and normalize
order = np.argsort(samples)
samples = samples[order]
weights = weights[order]/np.sum(weights)
# Compute inverse cumulative distribution function
CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
S = np.array([-np.inf, *samples, np.inf])
#cdf = sip.interp1d(S, CDF)
invcdf = sip.interp1d(CDF, S)
# Find smallest interval
distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
a = sop.minimize(distance, (1-level)/2, bounds=[(0,1-level)], method="Nelder-Mead").x[0]
interval = np.array([invcdf(a), invcdf(a+level)])
return interval
fastCL(samples, weights)
Edit: Small problem here: invcdf(0) goes to infinity due to interpolation bounds, invcdf(0) should be min(x)
import scipy.interpolate as sip
def fastCL(samples, weights, level=0.68):
# Sort and normalize
order = np.argsort(samples)
samples = samples[order]
weights = weights[order]/np.sum(weights)
# Compute inverse cumulative distribution function
CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
S = np.array([-np.min(samples), *samples, np.max(samples)])
#cdf = sip.interp1d(S, CDF)
invcdf = sip.interp1d(CDF, S)
# Find smallest interval
distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
a = sop.minimize(distance, (1-level)/2, bounds=[(0,1-level)], method="Nelder-Mead").x[0]
interval = np.array([invcdf(a), invcdf(a+level)])
return interval
fastCL(samples, weights)
Hi Stefan,
the fastCL
function is a really neat approach avoiding a KDE and the choice of a bandwidth. Seems to work well for large quantiles (e.g. 0.95), for smaller quantiles (e.g. 0.68) the accuracy appears to vary a bit more.
EDIT: The perceived lower accuracy for smaller quantiles appears to come down to insufficient samples in the high probability region. E.g. when using samples = stats.skewnorm.rvs(a=5, size=100000)
(all with equal weights), then the accuracy of the smaller quantiles gets better.
Anyhow, for future lookup and completeness I wanted to add this:
The bottle neck in this is the computation of the high accuracy KDE (i.e. high ncompress
in anesthetic). Once you have that, then the rest should compute very fast.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy import optimize
from scipy import interpolate
from scipy import integrate
from anesthetic import MCMCSamples
# using your example data
# -----------------------
f = lambda x: stats.skewnorm.pdf(x, a=5)
x = np.linspace(-4, 4, 1000)
samples = np.random.uniform(-4, 4, size=100000)
weights = f(samples)
mcmc = MCMCSamples(data=samples, weights=weights, columns=['f'])
# generate the KDE data with anesthetic's `plot_1d` function
# ----------------------------------------------------------
# some remarks:
# * `q=1` ensures that the full data-range is being used
# * `density=True` normalises the posterior (default would be that maximum is 1)
# * `ncompress=100000`: set to sth lower to save time, then increase for accuracy
fig, axes = mcmc.plot_1d(['f'], q=1, density=True, ncompress=100000, label='anesthetic')
# interpolate the KDE data to get an effective PDF
# ------------------------------------------------
pdf_kde = interpolate.interp1d(axes.iloc[0].lines[0].get_xdata(),
axes.iloc[0].lines[0].get_ydata(),
kind='cubic', bounds_error=False, fill_value=0)
# Compute iso-probability bounds
# ------------------------------
def root_func_for_lower_credibility_bound(z, pdf, ppf, q=0.68):
return pdf(ppf(z)) - pdf(ppf(z + q))
def get_iso_proba_bounds(pdf, q):
"""
Compute the lower and upper iso-probability bounds that enclose the quantile `q`.
Parameters
----------
pdf: scipy.interpolate.interpolate.interp1d
Probability density function of the posterior that you can get from the KDE.
q: float
Quantile that should be enclosed by the returned iso-probability bounds.
Returns
-------
lower, upper: (float, float)
Lower and upper iso-probability bounds given a `pdf` and a quantile `q`.
"""
if q < 0.5:
q = 1 - q
# integrate the PDF to get the CDF and interpolate to invert and get the PPF
sol = integrate.solve_ivp(lambda t, y: pdf(t),
t_span=(pdf.x[0], pdf.x[-1]),
y0=[0],
t_eval=pdf.x)
cdf = sol.y[0] / sol.y[0][-1]
ppf = interpolate.interp1d(cdf, sol.t, kind='cubic')
# run root finder
o = optimize.root_scalar(f=root_func_for_lower_credibility_bound,
args=(pdf, ppf, q),
bracket=[1e-5, 1-q-1e-5])
if o.converged:
return ppf(o.root), ppf(o.root + q)
else:
return o
b68 = get_iso_proba_bounds(pdf_kde, 0.68)
b95 = get_iso_proba_bounds(pdf_kde, 0.95)
# Plot results
# ------------
plt.figure()
plt.hist(samples, weights=weights, density=True, bins=1000, range=(-4, 4),
color='k', alpha=0.3, label='Samples')
plt.plot(x, f(x), c='k', label='true PDF')
plt.plot(x, pdf_kde(x), c='C2', label='anesthetic KDE')
# 68% iso-probability bounds
plt.axvline(b68[0], color='C1', ls='-', label='68% lower bound')
plt.axvline(b68[1], color='C4', ls=':', label='68% upper bound')
plt.axhline(f(b68[0]), color='C1', ls='-')
plt.axhline(f(b68[1]), color='C4', ls=':')
# 95% iso-probability bounds
plt.axvline(b95[0], color='C3', ls='-', label='95% lower bound')
plt.axvline(b95[1], color='C0', ls=':', label='95% upper bound')
plt.axhline(f(b95[0]), color='C3', ls='-')
plt.axhline(f(b95[1]), color='C0', ls=':')
plt.legend(loc='upper left')
plt.show()
Thanks for this @lukashergt, good to note these, I wasn't familiar with q
and density
here!
# * `q=1` ensures that the full x-range is being used
# * `density=True` normalises the posterior (default would be that maximum is 1)
# * `ncompress=100000`: set to sth lower to save time, then increase for accuracy
Edit: Based on this I wrote a bestfit function which might be of interest:
def bestFit(data, key, ncompress=10000, plot=False):
fig, axes = data.plot_1d([key], q=1, density=True, label='anesthetic', ncompress=ncompress)
pdf_kde = sip.interp1d(axes.iloc[0].lines[0].get_xdata(),
axes.iloc[0].lines[0].get_ydata(),
kind='cubic', bounds_error=False, fill_value=0)
if plot:
axes[key].hist(data[key], weights=data.weights, density=True, label='anesthetic', range=(0,0.1), bins=30)
plt.show()
else:
fig.clear()
plt.close(fig)
return sop.minimize(lambda x: -pdf_kde(x), fid[key], method="Nelder-Mead").x[0]
When I have time I'll make a PR for a confidence_level
function, and possibly also the bestfit
function, I think they could be useful.
Edit: Updated version to deal with duplicated points in line data (seen with Planck chains):
def bestFit(data, key, ncompress=10000, plot=False, verbose=True):
fig, axes = data.plot_1d([key], q=1, density=True, label='anesthetic', ncompress=ncompress)
x = axes.iloc[0].lines[0].get_xdata()
y = axes.iloc[0].lines[0].get_ydata()
mask = [*(np.diff(x)==0), False]
if np.sum(mask) > 0:
if verbose:
print("bestFit: Removing", np.sum(mask), "duplicates from kde interpolation function")
indices = np.where([*(np.diff(x)==0), False])[0]
assert np.all(y[indices] == y[indices+1])
x = x[np.logical_not(mask)]
y = y[np.logical_not(mask)]
pdf_kde = sip.interp1d(x,y,kind='cubic', bounds_error=False, fill_value=0)
if plot:
axes[key].hist(data[key], weights=data.weights, density=True, label='anesthetic', range=data.limits[key], bins=50)
else:
fig.clear()
plt.close(fig)
return sop.minimize(lambda x: -pdf_kde(x), fid[key], method="Nelder-Mead").x[0]
I will make a new PR replacing the complicated implementation in #179 by the fastCL
one above (renamed to {MCMC,Nested}Samples.credibility_interval()
).
Do we also want my hacky bestFit
code from above in anesthetic? I don't think it's good enough to be a built-in function, we should probably wait until we have a better way than internally making a plot.
Edit: Changelog
Added import scipy.optimize as sop
Changed method to "SLSQP"
as that one actually respects the bounds
.
Removed stray -
in front of np.min(samples)
import scipy.optimize as sop
import scipy.interpolate as sip
def fastCL(samples, weights, level=0.68):
# Sort and normalize
order = np.argsort(samples)
samples = samples[order]
weights = weights[order]/np.sum(weights)
# Compute inverse cumulative distribution function
CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
S = np.array([np.min(samples), *samples, np.max(samples)])
#cdf = sip.interp1d(S, CDF)
invcdf = sip.interp1d(CDF, S)
# Find smallest interval
distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
a = sop.minimize(distance, (1-level)/2, bounds=[(0,1-level)], method="SLSQP").x[0]
interval = np.array([invcdf(a), invcdf(a+level)])
return interval
fastCL(samples, weights)
Edit: Slightly neater version, planning to put this into a PR
import scipy.optimize as sop
import scipy.interpolate as sip
def fastCL(samples, weights=None, level=0.68):
assert level<1, "Level >= 1!"
weights = np.ones(len(samples)) if weights is None else weights
# Sort and normalize
order = np.argsort(samples)
samples = samples[order]
weights = weights[order]/np.sum(weights)
# Compute inverse cumulative distribution function
CDF = np.append(np.insert(np.cumsum(weights), 0, 0), 1)
S = np.array([np.min(samples), *samples, np.max(samples)])
invcdf = sip.interp1d(CDF, S)
if method=="iso-probability":
# Find smallest interval
distance = lambda a, level=level: invcdf(a+level)-invcdf(a)
a = sop.minimize_scalar(distance, bounds=(0,1-level), method="Bounded")
interval = np.array([invcdf(a.x), invcdf(a.x+level)])
elif method=="lower-limit":
# Get value from which we reach the desired level
interval = invcdf(1-level)
elif method=="upper-limit":
# Get value to which we reach the desired level
interval = invcdf(level)
else:
assert False, method
return interval
fastCL(samples, weights)
- Iso-probability contours containing 68% probability
- Smallest possible interval containing 68% confidence of PDF. (i.e. minimize prior-volume of confidence region)
But looking at your plot now I realize that the 2nd and 3rd option should be identical. I think about it like this: The 68% interval can only be moved by moving both bounds to the left, or both to the right. However, as the PDF has the same height at both points, when moving the lines, say, to the right, the left line will move to a higher and the right one to a lower PDF region, and therefore move less x-distance. Therefore the x-distance between the lines grows.
This is an incredibly neat result. If you want the d-dimensional differential geometric version I have (finally) got round to writing it up:
This makes me wonder whether we could use this for the 2d plots, and/or whether this would allow us to do away with (or equivalently construct an alternative approach for) kernel density estimation (in either the 2d or the 1d case).
Something to chew on. I'll get on and review this now.
Is your feature request related to a problem? Please describe. It is not trivial how to compute confidence intervals from
MCMCSamples
orNestedSamples
even though we have this functionality in the 1d plots (withfacecolor=True
).There is a function iso_probability_contours_from_samples which sounds like it should do that (but does not seem to), the documentation does not explain much and the source code looks mostly like
iso_probability_contours
.Describe the solution you'd like A function analogous to plot_1d which returns confidence intervals, e.g.
MCMCSamples.confidence_1d(key, type='fastkde', confidence_level=[0.68, 0.95])
could return[[(1.2,1.4), (1.53,1.55)], [(1.1, 1.6)]]
. Possible types would behist
,kde
, andfastkde
. Edit: We need to return a list of tuples in case there are multiple peaksDescribe alternatives you've considered In the past I just used the samples to compute the kde manually and compute this, but a built-in functionality would surely be beneficial for many users and also encourage using a sensible definition of confidence levels.