There was an error in how confidence intervals were calculated using student's t-test causing them to be far too wide. This PR fixes this.
Details
When calculating the confidence intervals of the mean of a sample using student's t-test, you need to use a t distribution with the standard deviation of the sample mean. However, we were scaling by the standard deviation of each sample, causing the intervals to be incorrect.
Overview
There was an error in how confidence intervals were calculated using student's t-test causing them to be far too wide. This PR fixes this.
Details
When calculating the confidence intervals of the mean of a sample using student's t-test, you need to use a t distribution with the standard deviation of the sample mean. However, we were scaling by the standard deviation of each sample, causing the intervals to be incorrect.
Fixes https://github.com/neulab/explainaboard_web/issues/541 which also provides a bit more context.
Also see discussion here for mathematical justification.
Blocked by #598