Confidence intervals - Githubissues

sboukortt commented 1 year ago

Hi,

Chapter 2 states:

This means that 95 studies out of 100, which would use the same sample size and target population, applying the same statistical test, will expect to find a result of mean differences between groups between 10.5 and 23.5.

This seems to correspond to misinterpretation 22 from “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations”.

The correct interpretation is that approximately 95 studies out of 100 would compute a confidence interval that contains the true mean difference – but it says nothing about which ones those are (whereas the data might).

In other words, 95% is not the probability of obtaining data such that the estimate of the true parameter is contained in the interval that we obtained, it is the probability of obtaining data such that, if we compute another confidence interval in the same way, it contains the true parameter. The interval that we got in this particular instance is irrelevant and might as well be thrown away.

Here is some nice reading on the subject:

E. T. Jaynes. Confidence Intervals vs. Bayesian Intervals (1976). DOI: 10.1007/978-94-009-6581-2_9

Let us try to understand what is happening here. It is perfectly true that, if the distribution (15) is indeed identical with the limiting frequencies of various sample values, and if we could repeat all this an indefinitely large number of times, then use of the confidence interval (17) would lead us, in the long run, to a correct statement 90% of the time. But it would lead us to a wrong answer 100% of the time in the subclass of cases where $\theta^ > x_1 +0.85$; and we know from the sample whether we are in that subclass. […] We suggest that the general situation, illustrated by the above example, is the following: whenever the confidence interval is not based on a sufficient statistic, it is possible to find a 'bad' subclass of samples, recognizable from the sample,* in which use of the confidence interval would lead us to an incorrect statement more frequently than is indicated by the confidence level; and also a recognizable 'good' subclass in which the confidence interval is wider than it needs to be for the stated confidence level. The point is not that confidence intervals fail to do what is claimed for them; the point is that, if the confidence interval is not based on a sufficient statistic, it is possible to do better in the individual case by taking into account evidence from the sample that the confidence interval method throws away.

Morey, R.D., Hoekstra, R., Rouder, J.N. et al. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev 23, 103–123 (2016). DOI: 10.3758/s13423-015-0947-8

For now, we give a simple example, which we call the “trivial interval.” Consider the problem of estimating the mean of a continuous population with two independent observations, $y_1$ and $y_2$. If $y_1 > y_2$, we construct an confidence interval that contains all real numbers $(-\infty, \infty)$; otherwise, we construct an empty confidence interval. The first interval is guaranteed to include the true value; the second is guaranteed not to. It is obvious that before observing the data, there is a 50 % probability that any sampled interval will contain the true mean. After observing the data, however, we know definitively whether the interval contains the true value. […] Once one has collected data and computed a confidence interval, how does one then interpret the interval? The answer is quite straightforward: one does not – at least not within confidence interval theory.⁸ As Neyman and others pointed out repeatedly, and as we have shown, confidence limits cannot be interpreted as anything besides the result of a procedure that will contain the true value in a fixed proportion of samples. Unless an interpretation of the interval can be specifically justified by some other theory of inference, confidence intervals must remain uninterpreted, lest one make arbitrary inferences or inferences that are contradicted by the data. This applies even to “good” confidence intervals, as these are often built by inverting significance tests and may have strange properties (e.g., Steiger, 2004).

(As I. J. Good put it: “One of the intentions of using confidence intervals and regions is to protect the reputation of the statistician by being right in a certain proportion of cases in the long run. Unfortunately, it sometimes leads to such absurd statements, that if one of them were made there would not be a long run.” with Jaynes’ truncated exponential being a nice example.)

sboukortt commented 1 year ago

(This definitely seems to give yet more support to “with frequentist statistics you have to choose one of two qualities for explanations: intuitive or accurate[17].”)

storopoli commented 1 year ago

You are definitely right. Despite knowing and having read both papers I've made the mistake. Thanks for the correction. PR #77 should fix this.

storopoli / Bayesian-Julia

Confidence intervals #76