Open GoogleCodeExporter opened 9 years ago
This is a lengthy response because you have asked a difficult question. First I
will discuss why there are “two (or sometimes three) different p values.”
Then I will discuss your error and your related question.
Let’s assume you are doing an MRMC variance analysis of the difference of two modalities.
There are two levels of statistical analysis. In the first level analysis, we
estimate the AUCs, the difference in AUCs (delta), and the variance of the
difference (V). If we take the difference in AUCs and divide it by the square
root of the variance of the difference, we get the test statistic, call it T.
In the second level of statistical analysis, we assume a distribution for the
test statistic. If we make the “normal” approximation …
• The p-value of the hypothesis test is 2*(1-F( |T| )), where |T| is the
absolute value of T and F() is the cumulative distribution function of the
normal distribution.
• The lower bound on the confidence interval is delta - 1.96 * sqrt(V).
• The upper bound on the confidence interval is delta + 1.96 * sqrt(V).
• If zero is contained in the confidence interval, we cannot reject the null.
If zero is not contained in the confidence interval, we reject the null.
Alternatively, we can assume that the distribution of the test statistic is
Student’s T. This T distribution acknowledges that the variance is being
estimated. The normal approximation assumes you know the variance perfectly.
Consequently, the T distribution is broader than the normal distribution
because it accounts for the uncertainty in the knowledge of the variance. The
normal approximation is optimistically biased; it yields confidence intervals
that are too small.
The amount of uncertainty in the knowledge of the variance is determined by the
degrees of freedom, df. In simple problems, df=N-1, where N is the number of
samples. For the MRMC analysis of the difference in AUCs, df is very
complicated.
In the iMRMC gui we present three approaches for the second level of
statistical analysis.
1. The normal approximation, which is equivalent to the T distribution with an
infinite df. Note that the T distribution is essentially the normal
distribution when df > 25.
2. The T distribution where the method to estimate df was derived by Brandon D.
Gallas in Obuchowski2012_Acad-Radiol_v19p1508.
3. The T distribution where the method to estimate df was derived by Stephen
Hillis in Hillis2008_Acad-Radiol_v15p647.
The method to estimate df derived by Hillis in 2008 can only be used when the
data is fully crossed: every reader reads every case in both modalities. So
this approach doesn’t appear when your data is not fully crossed. While not
implemented in iMRMC, Hillis has recently published estimates of df for special
study designs in Hillis2014_Stat-Med_v33p330. This probably explains why you
“get two (or sometimes three) different p values.”
Regarding the message that your DF_BDG is being set to a minimum: I was able to
replicate this problem given your data files that you shared (thanks for that).
I uncovered two kinds of problems. The first is a problem with the data. The
second is a problem with the results.
1. In three of the files, I found one reader (#8) had no signal present data
and another reader (#16) had no signal absent data. I noticed this when I first
checked “Input Statistics Charts”. These readers had half the number of
observations as the other readers. That alone may be ok, but when I checked the
“Show Study Design”, one reader was missing the first half of the data and
another reader was missing the second half of the data.
2. In the last of the files, the performance averaged over your readers in
modality “3” was near perfect, 0.98: two of the readers had perfect AUC,
three had AUC 0.997, and one had AUC 0.993. iMRMC cannot handle this. You may
be able to refer to Obuchowski2002_Acad-Radiol_v09p526, but I’m not ready to
analyze this data. Regrets. You do not want to run studies where you get
perfect performance; AUC has a limited useful dynamic range.
I have one more comment on the minimum df. Gaylor1969_Technometrics_v4p691
indicates that the minimum df of the sum of two mean squares is the minimum of
the separate degrees of freedom of the two mean squares. For our problem, we
only need to focus on the number of readers. The leading terms of the variance
of the difference in AUCs are the mean squares from (modality 1 x readers) and
(modality 2 x readers). The minimum df of the sum of these two terms is the
minimum of the number of readers in modality 1 (minus 1) and the number of
readers in modality 2 (minus 1). This is what drives the minimum df to change.
There are no considerations to be made by the change in this minimum. If df is
below the minimum, the analysis is very suspect. Hopefully you only got your
warnings for cases that had missing data or for cases where you can understand
why the analysis is limited: perfect performance.
Let me know if the answer is complete and clear enough. Then I will close the
issue.
Original comment by Brandon.Gallas
on 7 Mar 2015 at 5:50
Original comment by Brandon.Gallas
on 23 Mar 2015 at 8:18
Removing priority so that answered questions go to the bottom of the list.
Original comment by Brandon.Gallas
on 25 Mar 2015 at 5:47
Original issue reported on code.google.com by
silv...@gmail.com
on 4 Mar 2015 at 12:19Attachments: