Closed ThiagoReschutzegger closed 1 year ago
As the author of this code I have to say please inverstigate, I do not remember the exact details. For the 2D case, I followed a physics PhD-thesis. I am rather sure the 1D-case is correct, but less so for the 2D-part.
Thanks, @Tillsten!
Do you have a reference for the PhD thesis?
@ThiagoReschutzegger @Tillsten Yes, thanks.
I think there is pretty good evidence that conf_interval()
matches the values from inverting the curvature matrix for those "simple cases" where the error bars are symmetric and correlations are elliptical. So, if you have time to investigate this, I might suggest also doing a "simple case" like a polynomial or Gaussian model where the "normal, easy" results are known to be a pretty good start.
We can see that they diverge for a linear model.
import matplotlib.pyplot as plt
import numpy as np
from numpy.linalg import inv
from scipy.special import erf
import lmfit
# defition of the problem
x = np.linspace(1, 10, 250)
np.random.seed(0)
p = lmfit.Parameters()
p.add_many(('a', 4.), ('b', 4.))
y = 10 * x + 10 + 0.1 * np.random.randn(x.size)
def residual(p):
return p['a']*x + p['b'] - y
# fit the data
mini = lmfit.Minimizer(residual, p, nan_policy='propagate')
out1 = mini.minimize(method='Nelder')
out = mini.minimize(method='leastsq', params=out1.params)
sigmas = [1, 2, 3]
probs = [erf(s / np.sqrt(2)) for s in sigmas]
# Using conf_interval()
bound_a = np.zeros((len(sigmas), 2))
bound_b = np.zeros((len(sigmas), 2))
for i, sig in enumerate(sigmas):
ci = lmfit.conf_interval(mini, out, sigmas=[sig], trace=False)
bound_a[i] = (ci['a'][0][1], ci['a'][-1][1])
bound_b[i] = (ci['b'][0][1], ci['b'][-1][1])
colors = ['red', 'blue', 'green']
fig, ax = plt.subplots(figsize=(8, 8))
# cross marks the mean
plt.plot(out.params['a'].value, out.params['b'].value, 'x', color='gray')
# contour lines by conf_interval2d()
cx, cy, grid = lmfit.conf_interval2d(mini, out, 'a', 'b', 100, 100)
ax.contour(cx, cy, grid, levels=probs, colors=colors, linestyles='-')
# dashed lines by conf_interval()
for i, c in zip(range(3), colors):
ax.plot([bound_a[i][0], bound_a[i][1], bound_a[i][1], bound_a[i][0], bound_a[i][0]],
[bound_b[i][0], bound_b[i][0], bound_b[i][1], bound_b[i][1], bound_b[i][0]],
color=c, linestyle='--', alpha=0.5)
# axis name
ax.set_xlabel('a')
ax.set_ylabel('b')
plt.show()
Here's the result
After further thinking: Could you check if the projection of the 2D-plot gives you the same bounds?
I suspect the problem is with using f
and the function f_compare()
to assign "probabilities" here. In fact, I was recently reviewing a paper talking about using f-tests with chi-square from fits. I know people do that all the time, I know that we do it in confidence.py
-- I am still suspicious that the ideas of ANOVA actually apply to chi-square statistics from "best-fits", especially when the scaling of uncertainties is not perfect (ie, chi-square is kinda far from N_data - N_varies) In fact, I sort of think that @ThiagoReschutzegger might have found a demonstration that it is not.
OK, if I change conf_interval2d()
to be
out = minimizer.leastsq()
# prob = prob_func(result, out)
prob = out.chisqr # just give chi-square, we'll work out meaning later
and then think of sigma
levels as "increase chi-square from chisqr to chisqr + sigma*2 redchi", I get pretty good results. This uses a simple, common use case of Gaussian + Line, where the simplest estimate of standard-error is clearly both "not bad" and "not trivially perfect":
import numpy as np
from lmfit import conf_interval, conf_interval2d
from lmfit.lineshapes import gaussian
from lmfit.models import GaussianModel, LinearModel
from lmfit.printfuncs import report_ci
import matplotlib.pyplot as plt
np.random.seed(12)
x = np.linspace(1, 100, num=501)
y = (gaussian(x, amplitude=83, center=47., sigma=11.)
+ 0.03*x + 4.0 +
np.random.normal(size=len(x), scale=0.5) )
mod = GaussianModel() + LinearModel()
params = mod.make_params(amplitude=100, center=50, sigma=20,
slope=0, intecept=2)
out = mod.fit(y, params, x=x)
ci = conf_interval(out, out, sigmas=[1,2,3])
print(out.fit_report())
print("values from conf_interval():")
report_ci(ci)
nsamples = 100
parx = 'amplitude'
pary = 'sigma' # 'intercept', ....
c_x, c_y, c2mat = conf_interval2d(out, out, parx, pary, nsamples, nsamples)
# make contours of sigma-levels by hand from chisqr and redchi
sigma_mat = 5 * np.ones((nsamples, nsamples))
sigma_mat[np.where(c2mat < out.chisqr + 4**2*out.redchi)] = 4.0
sigma_mat[np.where(c2mat < out.chisqr + 3**2*out.redchi)] = 3.0
sigma_mat[np.where(c2mat < out.chisqr + 2**2*out.redchi)] = 2.0
sigma_mat[np.where(c2mat < out.chisqr + 1**2*out.redchi)] = 1.0
# to put X/Y axes in "(value - best)/stderr" from the simple values from leastsq
scaled_x = (c_x - out.params[parx].value)/out.params[parx].stderr
scaled_y = (c_y - out.params[pary].value)/out.params[pary].stderr
sigma_levels = [1, 2, 3, 4]
colors = ['black', 'red', 'blue', 'green']
fig, ax = plt.subplots(figsize=(8, 8))
ax.contour(scaled_x, scaled_y, sigma_mat,
levels=sigma_levels, colors=colors, linestyles='-')
ax.set_xlabel(f'{parx}: (val - best) / stderr')
ax.set_ylabel(f'{pary}: (val - best)/stderr ')
ax.grid(True, alpha=0.2, color='grey')
plt.show()
This prints out reports of
[[Model]]
(Model(gaussian) + Model(linear))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 43
# data points = 501
# variables = 5
chi-square = 124.467285
reduced chi-square = 0.25094211
Akaike info crit = -687.674157
Bayesian info crit = -666.591127
R-squared = 0.86967537
[[Variables]]
amplitude: 83.2520506 +/- 2.79650929 (3.36%) (init = 100)
center: 47.4795135 +/- 0.29261352 (0.62%) (init = 50)
sigma: 11.3599111 +/- 0.33813825 (2.98%) (init = 20)
slope: 0.03091732 +/- 8.4513e-04 (2.73%) (init = 0)
intercept: 3.91775813 +/- 0.05877072 (1.50%) (init = 0)
fwhm: 26.7505458 +/- 0.79625472 (2.98%) == '2.3548200*sigma'
height: 2.92368174 +/- 0.06605574 (2.26%) == '0.3989423*amplitude/max(1e-15, sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(slope, intercept) = -0.793
C(amplitude, sigma) = 0.752
C(amplitude, intercept) = -0.581
C(sigma, intercept) = -0.437
C(center, slope) = -0.360
C(center, intercept) = 0.286
C(amplitude, slope) = 0.140
C(sigma, slope) = 0.105
values from conf_interval():
99.73% 95.45% 68.27% _BEST_ 68.27% 95.45% 99.73%
amplitude: -8.10658 -5.46166 -2.76378 83.25205 +2.84444 +5.78597 +8.84405
center : -0.88389 -0.58706 -0.29289 47.47951 +0.29288 +0.58696 +0.88362
sigma : -0.97796 -0.65958 -0.33416 11.35991 +0.34484 +0.70251 +1.07548
slope : -0.00254 -0.00169 -0.00085 0.03092 +0.00085 +0.00170 +0.00255
intercept: -0.17979 -0.11891 -0.05906 3.91776 +0.05852 +0.11672 +0.17481
Reduced chi-square is within an order of magnitude of 1, but not exactly 1. conf_intervals()
shows the uncertainties are not perfectly symmetric, but also shows that these simple estimates of stderr
are good to 2 decimal places (anyone believing more precision on that does not really understand what that number is meant to convey anyway).
The contour plot of the "set-by-hand" sigma-levels looks like this:
which shows pretty good agreement of the 4 contour levels compared to the stderr from leastsq
. The contour at sigma=4
is noticeably off a little bit for amplitude
, as conf_interval
also finds. But that is a "refinement of the simple value" - it is hard to say any of these are "wrong".
Doing that for a couple of other variable pairs gives:
and
again, showing that getting the sigma-levels from "increase chisqr by N*2redchi" seems pretty reliable. I think that is not circular, and I think that the use of the F-test for chi-square is a little weak, but I am not trained as a statistician ("How did I get here?")
I don't have a stong opinion on how to fix this:
a) fix the use of "f_compare" (not sure to what...!)
b) change conf_interval2d()
to return a matrix of chi-square values and let the user deal with that....
Any thoughts or suggestions?
I also do not know how I ended up here. Hopefully I will have time this weekend to find the references. IRC the f-test is valid for nested models and in the case of conf1d case actually is identical to a chi sq2 distribution, since the difference of the dofs=1. But there is a reason why the probability function is easily changeable and I put used definitions at the top of the documentation...
After further thinking: Could you check if the projection of the 2D-plot gives you the same bounds?
Sure! But how would that help? The contours of the 2D plot are calculated using the built-in function from matplotlib, which uses ContourPy's function 'mpl2014'
.
I am not trained as a statistician
Me neither. So everything I say can be wrong somehow.
Honestly, I am also not sure how to fix the use of f_compare
. However, running some more tests, I found a special case where conf_interval
matches with the ellyptical approximation for $\sigma = 1$ assuming Normal distribution. $V$ is the covariance matrix and $\mu$ is the array with means.
$$ \begin{flalign} \sigma = (x - \mu)^T\ V^{-1}\ (x - \mu) \ \ p = \textrm{erf}\Bigg(\frac{\sigma}{\sqrt{2}}\Bigg) \end{flalign} $$
import matplotlib.pyplot as plt
import numpy as np
from numpy.linalg import inv
from scipy.special import erf
import lmfit
def calc_prob_approx(x, mu, V):
s = (x - mu).T @ inv(V) @ (x - mu)
return erf(s / np.sqrt(2))
# defition of the problem
x = np.linspace(1, 10, 250)
np.random.seed(0)
p = lmfit.Parameters()
p.add_many(('a', 4.), ('b', 4.))
# add gaussian noise
y = 10 * x + 10 + np.random.normal(0, 1, x.shape)
def residual(p):
return p['a']*x + p['b'] - y
# fit the data
mini = lmfit.Minimizer(residual, p, nan_policy='propagate')
out1 = mini.minimize(method='Nelder')
out = mini.minimize(method='leastsq', params=out1.params)
sigmas = [1, 2, 3]
probs = [erf(s / np.sqrt(2)) for s in sigmas]
# Using conf_interval()
bound_a = np.zeros((len(sigmas), 2))
bound_b = np.zeros((len(sigmas), 2))
for i, sig in enumerate(sigmas):
ci = lmfit.conf_interval(mini, out, sigmas=[sig], trace=False)
bound_a[i] = (ci['a'][0][1], ci['a'][-1][1])
bound_b[i] = (ci['b'][0][1], ci['b'][-1][1])
# Using the covariance matrix approximation
V = out.covar
mu = np.array([out.params['a'].value, out.params['b'].value])
a_stderr = out.params['a'].stderr
b_stderr = out.params['b'].stderr
f = 2
# create a grid of points
a = np.linspace(mu[0] - f * a_stderr, mu[0] + f * a_stderr, 100)
b = np.linspace(mu[1] - f * b_stderr, mu[1] + f * b_stderr, 100)
A, B = np.meshgrid(a, b)
X = np.array([A, B]).T
# calculate the probability at each point
prob = np.zeros(X.shape[:2])
for i in range(X.shape[0]):
for j in range(X.shape[1]):
prob[i, j] = calc_prob_approx(X[i, j], mu, V)
colors = ['red', 'blue', 'green']
fig, ax = plt.subplots(figsize=(8, 8))
# cross marks the mean
plt.plot(out.params['a'].value, out.params['b'].value, 'x', color='gray')
# contour lines by covariance matrix approximation
ax.contour(A, B, prob, probs, colors=colors, linestyles='-.')
# contour lines by conf_interval2d()
nsamples = 100
cx, cy, grid = lmfit.conf_interval2d(mini, out, 'a', 'b', nsamples, nsamples)
ax.contour(cx, cy, grid, levels=probs, colors=colors, linestyles='-')
# dashed lines by conf_interval()
for i, c in zip(range(3), colors):
ax.plot([bound_a[i][0], bound_a[i][1], bound_a[i][1], bound_a[i][0], bound_a[i][0]],
[bound_b[i][0], bound_b[i][0], bound_b[i][1], bound_b[i][1], bound_b[i][0]],
color=c, linestyle='--', alpha=0.5)
# axis name
ax.set_xlabel('a')
ax.set_ylabel('b')
plt.show()
We get this result, which shows what I said. Not sure how can this information be helpful. Sending it here just in case it is.
Oh what I meant is summing over on variable to marginalize it out. In this case the 1D confidence intervals are fitting rather well.
@Tillsten Yes, I agree that conf_interval()
does seem to be consistent with scaling stderr
values, or as consistent as can be expected. I am not sure why conf_interval2d()
is not.
But I would be willing to either modify conf_interval2d
or make a new nearly-identical function (maybe chi_square_map()
) that returns a 2-D array of chi-square that for any pair of parameter values.
@ThiagoReschutzegger I'm not sure I follow the latest code (or maybe "interpret the plots"): could that difference just be the re-scaling of the covariance? As a general statement, I would say that this is all confusing enough that I would definitely recommend printing out numerical values (and including those here) and clearly commenting on what you see, how it differs from what you expect, and what you think is the problem. Thanks.
The example added with #852 might help clarify how a map of chi-square values can be used to show the n-sigma contour plots.
I suspect I might be missing something here, and in that case I apologize, but how exactly is this behavior different from the known property that 1d confidence intervals for a single variable are not equal to the projection of 2d confidence intervals for the same confidence level? I'm thinking of this classical figure below (from Numerical Recipes).
Again, it's entirely possible that I misunderstood the original issue, in which case please just ignore this comment.
@mdaeron I believe that Numerical Recipes section is meant to caution the reader to take into account the correlations between parameters when assigning confidence levels -- we do that (or "try to do that") here.
If I understand the figure correctly (without actually having re-read the text!), 68% of all the values (the dots) fall between the two horizontal lines bracketed by "68% confidence interval on a1" label, and similarly for a0. One might be tempted to interpret that range as the 68% confidence level for the value of a1. Doing so ignores the bias (correlation) of a1 with a0, and is universally cautioned against.
The 68% of (a0, a1) values with the lowest fit residual (chi-square or some other figure of merit) lie within the ellipse, and the full extent of that curve for a1 (and a0) should be used as the 68% / 1-sigma confidence level. All of the estimates of uncertainties in lmfit (including the simplest and automatically estimated values) do take such correlations into account.
The issue here is that the boxes in @ThiagoReschutzegger calculated as the n-Sigma levels by conf_interval()
do not fully encompass their corresponding contour line - they should. It is not 100% clear to me why they don't, but it is not because the correlations are not considered -- they are. I suspect that the use of the F test on values of chi-square meant to measure changes in fit quality when changing the number of variables is not exactly what is needed here.
Using the code at #852 to simply return the matrix of chi-square, and modifying @ThiagoReschutzegger example to use
# contour lines by conf_interval2d()
nsamples = 100
## cx, cy, grid = lmfit.conf_interval2d(mini, out, 'a', 'b', nsamples, samples)
## ax.contour(cx, cy, grid, levels=probs, colors=colors, linestyles='-')
# here grid is a matrix of "chi2_out - out.chisqr", the change in chi-square away from the best fit.
cx, cy, grid = lmfit.conf_interval2d(mini, out, 'a', 'b', nsamples, samples, chi2_out)
sigma = np.sqrt(abs(grid)/out.redchi) # grid matrix of sigma values
probs = scipy.special.erf(sigma/np.sqrt(2)) # matrix of probabilities (needs scipy.special.erf)
ax.contour(cx, cy, probs, levels=probs, colors=colors, linestyles='-')
then the boxes (found with conf_interval()
, which does use an F test!) do line up with the elliptical contours that express levels of constant probabilities:
@mdaeron I believe that Numerical Recipes section is meant to caution the reader to take into account the correlations between parameters when assigning confidence levels -- we do that (or "try to do that") here.
If I understand the figure correctly (without actually having re-read the text!), 68% of all the values (the dots) fall between the two horizontal lines bracketed by "68% confidence interval on a1" label, and similarly for a0. One might be tempted to interpret that range as the 68% confidence level for the value of a1. Doing so ignores the bias (correlation) of a1 with a0, and is universally cautioned against.
OK, thanks for the explanation. I suspect there is an interesting (and valid) “cultural” difference here, as by contrast I was taught (my background being physics/chemistry/applied maths) that there are many ways of arbitrarily defining X % confidence regions in the parameter space. Some of them are silly, of course, but iso-χ2 (the ellipses shown here) and unidimensional “bands” in the parameter space (the vertical or horizontal bands above) all appear to be potentially useful. Sometimes you're really only interested in estimating one parameter and the iso-χ2 approach is not relevant.
As far as I can tell this is consistent with the message in that chapter of NumRecipes, which IIRC views the iso-χ2 approach as natural, but not the only valid option. My views above are also heavily influenced by Inverse Problem Theory and Model Parameter Estimation (A. Tarantola, SIAM, 2005). But I realize that this is not a universal point of view.
The 68% of (a0, a1) values with the lowest fit residual (chi-square or some other figure of merit) lie within the ellipse, and the full extent of that curve for a1 (and a0) should be used as the 68% / 1-sigma confidence level. All of the estimates of uncertainties in lmfit (including the simplest and automatically estimated values) do take such correlations into account.
Thanks, I learned something new! Because I don't use conf_interval
it was not obvious to me that this is what lmfit is doing.
@mdaeron Most of the people writing and using lmfit have backgrounds in physics or chemistry or related fields. The iso-chi-square view may not be universal, but it is the only view that is relevant here. Anyone who wants to do something different will need to use other tools that give incorrect results.
Thanks, I learned something new! Because I don't use conf_interval it was not obvious to me that this is what lmfit is doing.
Not only does conf_interval()
take correlations into account, but the uncertainties calculated from a plain "invert the curvature matrix" as tried (and usually done successfully for all fits) also take correlations into account. Needing to take correlations into account should be obvious to everyone measuring confidence intervals.
I can certainly believe there are people who want to not do that. Some of those people may write textbooks. I have no idea what the reference you cite says, but if it says that there is ever a justification for ignoring correlations between variables, I am sorry you had to study from that book - you were misinformed.
Hey again,
Using the code at #852 applied to the first example, we get the result:
Which is perfect. I'm closing the issue.
Thanks all!
Thank for the awesome work, @newville!
@newville, to address the factual part of your answer (because obligatory xkcd) in a civil manner:
It's not correlation, it's dimensionality. Consider the iso-χ2 confidence regions for 1,2,3,4... independent Gaussian variables. This will yield greater and greater 1-D confidence intervals as dimensionality increases.
(You may argue that is The Right Way, but if variables are independent, what is the rational basis for considering them jointly? Why not add yet another, unrelated variable?)
It's not like there is no literature on the subject.
That'll be all from me now, I don't want to turn this into a game of ping-pong.
@mdaeron
It's not correlation, it's dimensionality. Consider the iso-χ2 confidence regions for 1,2,3,4... independent Gaussian variables. This will yield greater and greater 1-D confidence intervals as dimensionality increases.
The uncertainties and confidence levels for every variable must take into account the potential correlations between all variables. To this "by hand" (which conf_interval()
and conf_interval2d()
both do), if you move a
away from its best-fit value, you must reoptimize all other variables to find what the best chi-square is for that value of a
. The 1-sigma / 68% uncertainty in a
is given by those values of a
where those chi-square values that responded to the changing value of a
are increased by reduced chi-square. That is also very consistent (in theory and in actual numerical results) with the approach "invert covariance matrix" does too. This is how it has always been done in lmfit
.
(You may argue that is The Right Way, but if variables are independent, what is the rational basis for considering them jointly? Why not add yet another, unrelated variable?)
I am not arguing, there is no argument to be had, it is the Right Way. If some variables are actually independent of one another, there is no problem accounting for correlations. Indeed the process will show what that correlation is.
Again, all fits done with lmfit (which is, like a decade old at this point) do calculate uncertainties that account for correlations and report these values -- the fit report sorts correlations and reports those.
It's not like there is no literature on the subject.
I don't know of anyone here who is not aware of that. Indeed the question is why the different ways to measure confident intervals were giving results, not how to account for correlations. This was not changed.
That'll be all from me now, I don't want to turn this into a game of ping-pong.
Great to hear that. Cheers.
Yes, I read the instructions and I am sure this is a GitHub Issue.
Apparently, there is a difference in calculating the confidence region using the
conf_interval
or theconf_interval2d
method. It was expected that when plotting the regions calculated by both methods (as I did below), the 1D confidence region would be overlapping the maximum points and minima of the 2D region. However, the results show a significant deviation from this expectation.I would like to investigate this issue further, but before doing so, I need to know whether this behavior is expected or not. If it is not expected, I can focus on understanding the underlying reasons behind this behavior.
Can anyone provide any insights or guidance regarding this issue? Any help would be greatly appreciated.