Binned statistic like scipy.stats.binned_statistic_dd similar to ndarray_stats::histogram::Histogram would allow calculation of more statistical features like weighted histograms, means, variances, min, max etc. of each bin. I would like to add something like that and would be grateful for opinions on how that should look like. @LukeMathWalker
All vs. only one statistic
Is it a good idea to calculate all statisics when a value is pushed to be binned or should only one statistic be calculated which has to be selected beforehand?
bs = BinnedStatistic(grid) vs. bs = BinnedStatistic(grid, variance).
Type of output array
Histograms solely count the number of observations in each bin. The default value is zero. For other statistics zero is a valid result event with values in that bin.
The output could be just the numerical value and comparison with the histogram (through an additional function) allows knowing which bins are empty, or would something similar to Option<T> be a good output?
[..., 0.0, 0.0, 1.2, ...] vs [..., Value(0.0), Empty, Value(1.2), ...].
Binned statistic like scipy.stats.binned_statistic_dd similar to ndarray_stats::histogram::Histogram would allow calculation of more statistical features like weighted histograms, means, variances, min, max etc. of each bin. I would like to add something like that and would be grateful for opinions on how that should look like. @LukeMathWalker
All vs. only one statistic
Is it a good idea to calculate all statisics when a value is pushed to be binned or should only one statistic be calculated which has to be selected beforehand?
bs = BinnedStatistic(grid)
vs.bs = BinnedStatistic(grid, variance)
.Type of output array
Histograms solely count the number of observations in each bin. The default value is zero. For other statistics zero is a valid result event with values in that bin. The output could be just the numerical value and comparison with the histogram (through an additional function) allows knowing which bins are empty, or would something similar to
Option<T>
be a good output?[..., 0.0, 0.0, 1.2, ...]
vs[..., Value(0.0), Empty, Value(1.2), ...]
.