Open penelope-howe opened 1 year ago
Thanks for the bug report. Can you provide a reference? Thanks!
In this article:
Gillick & Cox (1989). SOME STATISTICAL ISSUES IN THE COMPARISON OF SPEECH RECOGNITION ALGORITHMS. Proc. IEEE Conf. on Acoustics, Speech and Sig. Proc., Glasgow, 1989, pages 532–535.
(Available here: https://www.researchgate.net/publication/3548274_Some_statistical_issues_in_the_comparison_of_speech_recognition_algorithms)
Equation (13) is the calculation for the normal approximation that is essentially implemented in compute_acc_binomial, where their N[1,0] is your R and their k is your n. In the SCTK code, the equation looks slightly different because their equation always uses the value from the cell at location {1,0}, rather than using the minimum of cells {0,1} and {1,0} as in the SCTK code. They subtract 1/2 from the absolute value of |N[1,0] - k/2| for the continuity correction, whereas in SCTK you should add 1/2 for the continuity correction to implement an equivalent calculation, since (R-n/2) will always be < 0 and the absolute value is taken later.
Corrected an error in the implementation of the normal approximation of the accumulated binomial distribution. In the previous implementation, the results of the McNemar test were incorrect under the normal approximation. It was impossible to score above the threshold for significance (i.e., to have an insignificant result) unless the unique utterance errors were equal in the two systems. Thus, for example, systems with 10 and 11 u.u.e.s, respectively, would erroneously be found to be significantly different. In contrast, systems with 9 and 11 u.u.e.s, respectively, were correctly found not to be significantly different under the properly implemented exact binomial calculation of McNemar.