Closed jturner314 closed 6 years ago
I'll review it tomorrow morning.
On histogram's panics: should we change the signature to return a Result
?
We panic in those occasions where it is not possible to provide an answer that makes sense:
Returning a Result would probably give a chance to the user to specify a default that makes sense to them, especially if we provide a useful error.
For empty arrays, I think the right thing to do is to have Edges
be an empty Vec
(no bins).
For constant arrays, I think the right thing to do is have Edges
be vec![the_constant_value]
(no bins) or vec![the_constant_value, the_constant_value + 1]
(one bin). Alternatively, if we change Edges
to allow duplicate edges, we could use vec![the_constant_value; 2]
(one bin).
For Freedman Diaconis with IQR=0, I'm not sure; that requires some thought. Panicking definitely isn't the right thing, because IMO panics shouldn't be a function of the data in the array. I think I'd return an Option
/Result
or have a fallback strategy. At first glance, the Option
/Result
appeals to me more.
By the way, in a future version, I'm thinking about replacing GridBuilder
with a fit_array
method on Grid
and simplifying the BinsBuildingStrategy
trait:
impl<A: Ord> Grid<A> {
fn fit_array<S, B>(array: &ArrayBase<S, Ix2>, strategy: &B) -> Result<Self, Error>
where
S: Data<Elem = A>,
B: BinsBuildingStrategy<A>,
{
// ...
}
}
pub trait BinsBuildingStrategy<A: Ord> {
fn fit_array<S>(&self, array: &ArrayBase<S, Ix1>) -> Result<Bins<A>, Error>
where
S: Data<Elem = A>;
}
The BinsBuildingStrategy
types wouldn't contain the min/max/n_bins/etc. derived from a specific array; they'd contain whatever metadata would be necessary to calculate the Bins
in fit_array
. This would simplify handling of special cases and would simplify the public API a little. What do you think about this?
For empty arrays, I think the right thing to do is to have
Edges
be an emptyVec
(no bins).
It makes sense, I agree.
For constant arrays, I think the right thing to do is have
Edges
bevec![the_constant_value]
(no bins) orvec![the_constant_value, the_constant_value + 1]
(one bin). Alternatively, if we changeEdges
to allow duplicate edges, we could usevec![the_constant_value; 2]
(one bin).
We could use a "default" width of 1, that could be the way. I'd avoid duplicate edges since they would still lead to empty bins and a degenerate grid (no observation would be captured), thus surprising the user.
By the way, in a future version, I'm thinking about replacing
GridBuilder
with afit_array
method onGrid
and simplifying theBinsBuildingStrategy
trait:impl<A: Ord> Grid<A> { fn fit_array<S, B>(array: &ArrayBase<S, Ix2>, strategy: &B) -> Result<Self, Error> where S: Data<Elem = A>, B: BinsBuildingStrategy<A>, { // ... } } pub trait BinsBuildingStrategy<A: Ord> { fn fit_array<S>(&self, array: &ArrayBase<S, Ix1>) -> Result<Bins<A>, Error> where S: Data<Elem = A>; }
The
BinsBuildingStrategy
types wouldn't contain the min/max/n_bins/etc. derived from a specific array; they'd contain whatever metadata would be necessary to calculate theBins
infit_array
. This would simplify handling of special cases and would simplify the public API a little. What do you think about this?
A fit_array
method sounds good to me, as a shortcut on Grid
to simplify the public API, but you still need extremes and a bin width to execute all the strategies we have implemented, hence I think the new strategy would end up quite similar to the ones we have right now. They would probably just squeeze everything in a single method.
This PR does a few things:
min
/max
methods with the oldmin_partialord
/max_partialord
. (Renamemin_partialord
/max_partialord
tomin
/max
.) See this comment for more explanation. The primary issue with the oldmin
/max
methods is that they panicked on empty arrays.quantile_mut
returnNone
for empty arrays. It's easy to forget about the possibility of empty arrays, so it's important to remind the user of the possibility and force them to handle it. See this comment for more justification.quantile_mut
.Edit: I'm uncomfortable with the histogram strategies panicking in so many cases, but we can fix that in a later version.