neulab / explainaboard_web

MIT License
8 stars 2 forks source link

Confidence intervals are too wide for accuracy-based metrics #541

Closed neubig closed 1 year ago

neubig commented 1 year ago

For example, when analyzing sst2, we get these very wide confidence intervals.

Screen Shot 2022-11-30 at 11 16 43 AM

When measuring the actual confidence interval according to a t-test calculator, we get much less wide confidence intervals.

Screen Shot 2022-11-30 at 11 11 43 AM

Related discussion: https://github.com/neulab/ExplainaBoard/issues/510