Closed tetsuok closed 1 year ago
FYI: We can find some explanations to support this change, e.g., https://www.stata.com/support/faqs/statistics/bootstrapped-samples-guidelines/
Roughly speaking, resampling from $n$ data with other sample size $n'$ changes the tendency of obtained statistics from the original sample. To avoid this effect, we need to always use the same sample size $n$ for every resampling round, or the obtained statistics lack the correct meaning.
The sampling part of the procedure of Bootstrapping can be summarized at high level if I understand it correctly:
where the number of random samples generated in Step 1 should be same as the number of samples in the data. (The similar description is found in scipy.stats.bootstrap)
However, it seems
calc_confidence_interval
implements differently, which raises a concern for the correctness of the algorithm. I could be wrong, but it seems better to prefer the correctness over the efficiency of resampling given that computed values with this library could be used in someone's research. With this in mind, this PR removesprop_samples
fromcalc_confidence_interval
.