rust-ndarray / ndarray-stats

Statistical routines for ndarray
https://docs.rs/ndarray-stats
Apache License 2.0
201 stars 25 forks source link

Binned statistic similar to scipy.stats.binned_statistic_dd #62

Open RolfStierle opened 4 years ago

RolfStierle commented 4 years ago

Binned statistic like scipy.stats.binned_statistic_dd similar to ndarray_stats::histogram::Histogram would allow calculation of more statistical features like weighted histograms, means, variances, min, max etc. of each bin. I would like to add something like that and would be grateful for opinions on how that should look like. @LukeMathWalker

All vs. only one statistic

Is it a good idea to calculate all statisics when a value is pushed to be binned or should only one statistic be calculated which has to be selected beforehand? bs = BinnedStatistic(grid) vs. bs = BinnedStatistic(grid, variance).

Type of output array

Histograms solely count the number of observations in each bin. The default value is zero. For other statistics zero is a valid result event with values in that bin. The output could be just the numerical value and comparison with the histogram (through an additional function) allows knowing which bins are empty, or would something similar to Option<T> be a good output? [..., 0.0, 0.0, 1.2, ...] vs [..., Value(0.0), Empty, Value(1.2), ...].