neulab / ExplainaBoard

Interpretable Evaluation for AI Systems
MIT License
359 stars 36 forks source link

Remove prop_samples from calc_confidence_interval #502

Closed tetsuok closed 1 year ago

tetsuok commented 1 year ago

The sampling part of the procedure of Bootstrapping can be summarized at high level if I understand it correctly:

  1. Resample each sample in the given data.
  2. Repeat Step 1 several times

where the number of random samples generated in Step 1 should be same as the number of samples in the data. (The similar description is found in scipy.stats.bootstrap)

However, it seems calc_confidence_interval implements differently, which raises a concern for the correctness of the algorithm. I could be wrong, but it seems better to prefer the correctness over the efficiency of resampling given that computed values with this library could be used in someone's research. With this in mind, this PR removes prop_samples from calc_confidence_interval.

odashi commented 1 year ago

FYI: We can find some explanations to support this change, e.g., https://www.stata.com/support/faqs/statistics/bootstrapped-samples-guidelines/

Roughly speaking, resampling from $n$ data with other sample size $n'$ changes the tendency of obtained statistics from the original sample. To avoid this effect, we need to always use the same sample size $n$ for every resampling round, or the obtained statistics lack the correct meaning.