johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
9.02k stars 217 forks source link

Adding a verb to do bootstrap resampling for confidence interval. #1670

Open SamuelLarkin opened 1 month ago

SamuelLarkin commented 1 month ago

Hi, I would like to suggest a new feature. I would like to be able to do bootstrap resampling to get a confidence interval. You can find an example of how it works here: scipy.stats.bootstrap().

johnkerl commented 1 month ago

@SamuelLarkin have you seen https://miller.readthedocs.io/en/latest/reference-verbs/#bootstrap ?

SamuelLarkin commented 1 month ago

@johnkerl yes I did, but I failed to see how I could sample do confidence interval using bootstrap. I would need to sample 1000 x len(data), group the data in 1000 batches of len(data), perform mean on each batch then do symetric percentile 5%. How to I add the batch id in order to group the data in 1000 batches of len(data)?