Closed FreezyLemon closed 2 months ago
I do see good reason for all of these and I'd be open to making changes for all but the infallible new
not returning a result.
All other public structs defined in the distribution
module have a new
method implemented that returns Result so it does provide consistency. Perhaps if Empirical
were in a different module or it's new
had a different name?
All other public structs defined in the distribution module have a new method implemented that returns Result so it does provide consistency. Perhaps if Empirical were in a different module or it's new had a different name?
I see your point about consistency. I would personally value the expressiveness ("this call cannot fail") over the consistency ("all constructors return a Result and might need error handling"), but it doesn't matter much tbh.
Hmm I'm not sure about renaming the new
function, it's a widespread naming convention in the Rust ecosystem and what most users would expect. Maybe a similar name like new_<something>
so people can quickly find it in their IDEs.
Perhaps an impl Default
over having Empirical::new
?
Regardless, the overall discussion you bring up on the error type is valid. I'd merge an effort that does any of
StatsError::BadParams
less where possible.During these changes, we should also work on removing support for degenerate distributions #102 and returning errors.
Tried a different approach, even wrote out an example to test using it, but I didn't have a great motivating case, so I just made something up, but I wanted to see what it was like to actually handle those such errors.
Maybe we should figure out what the errors should be able to convey first. Since @FreezyLemon posted on #242 I've realized I've not done enough of this bigger picture stuff for decisions. I think it's fair to narrow the discussion to errors from distribution construction since that's at a module scope and most of where errors are handled.
The value in #258 is to make the errors flat, and delegates identifying what we call "bad parameters" to the caller. This is quite reasonable, as the conditions for errors are statically specified (up to decisions about what constitutes degeneracy), but there is no program state in which parameter conditions change. However it means that users of the library may repeat their work among projects where there's some input-validation cycle if they want useful error messages.
The option I have in #247 is pretty dense, and it means all distributions' new
have a different type for the Err variant. Would need quite a few imports just to do something that creates multiple distributions from different families. But it has the value of granularity.
A few lists to help organize my thoughts:
unwrap
or pair ?
with anyhow
.statrs
wish for choice between dynamic and static dispatch.nalgebra
matrices
panic
on things like this for now. Evaluating pdf for Dyn
sizes may fail the unwrapping hereI'm very open to modifying these lists.
I realize yet another option is to have the distribution error have too many variants for any one distribution, but use shared language location
, scale
, shape
, etc. in the variants. Suppose this is called ParameterError
. Then in each distribution's module, impl From<super::ParameterError>
for the distribution's specific error or use some generics to impl Error with source converting to the right error.
This allows a shallow error, small step from unwrap for those writing analyses. And allows for deeper error which can rely more on handling programmatically. But I don't think this is a common idiom, the more I think about this, the more certain there's something easier I'm missing here and I don't think it's needed to replicate std::io
's pattern of error handling.
the more I think about this, the more certain there's something easier I'm missing here and I don't think it's needed to replicate std::io's pattern of error handling.
Most usages of StatsError
are currently for the return value of SomeDistribution::new(...)
. Since this function is implemented per-distribution and not to satisfy some trait, the simplest solution here would be creating a concrete error type for each and every distribution in case the construction fails. For example:
Let's turn
// beta.rs
#[derive(Copy, Clone, PartialEq, Debug)]
pub struct Beta {
shape_a: f64,
shape_b: f64,
}
impl Beta {
pub fn new(shape_a: f64, shape_b: f64) -> Result<Beta> {
if shape_a.is_nan()
|| shape_b.is_nan()
|| shape_a.is_infinite() && shape_b.is_infinite()
|| shape_a <= 0.0
|| shape_b <= 0.0
{
return Err(StatsError::BadParams);
};
Ok(Beta { shape_a, shape_b })
}
}
into
#[derive(Copy, Clone, PartialEq, Debug)]
pub struct Beta {
shape_a: f64,
shape_b: f64,
}
#[derive(Copy, Clone, PartialEq, Eq, Debug)]
pub enum BetaError {
ShapeANan,
ShapeAInfinite,
ShapeALessThanZero,
ShapeBNan,
ShapeBInfinite,
ShapeBLessThanZero,
}
impl std::fmt::Display for BetaError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
BetaError::ShapeANan => write!(f, "Shape A is NaN"),
// ...
}
}
}
impl std::error::Error for BetaError {}
impl Beta {
pub fn new(shape_a: f64, shape_b: f64) -> Result<Beta, BetaError> {
if shape_a.is_nan() {
Err(BetaError::ShapeANan)
} else if shape_a.is_infinite() {
Err(BetaError::ShapeAInfinite)
} else if shape_a <= 0.0 {
Err(BetaError::ShapeALessThanZero)
} else if shape_b.is_nan() {
Err(BetaError::ShapeBNan)
} else if shape_b.is_infinite() {
Err(BetaError::ShapeBInfinite)
} else if shape_b <= 0.0 {
Err(BetaError::ShapeBLessThanZero)
} else {
Ok(Beta { shape_a, shape_b })
}
}
}
Maybe this example is too verbose (i.e. maybe BetaError::ShapeAInvalid
would be enough), but it gets the point across.
This will return very specific errors, and users that don't care about it can still .unwrap()
or use pattern matching to handle whichever errors. This will not impact the distributions' traits (because new
is not a part of any trait) and should allow for dynamic dispatch.
If new methods of creating a distribution are added later, the related error type could be extended or a separate error type, specific to the method of construction, could be added.
It will need a fair bit of boiler plate, especially now that it needs to be implemented all at once, but is otherwise very simple to implement. A crate like thiserror
could cut down on boilerplate, but is honestly not needed. The creation errors are simple and thiserror
is/has a proc-macro which means it has a negative impact on compilation times.
I might be missing something here, but this is the obvious KISS solution IMHO
Ah, it's true that the generic part can only come up once the type is correctly initialized.
I'm good with only specifying which parameter is invalid, so we can just have one variant per parameter, the one message denoting that one of its (potentially multiple) constraints was violated.
impl std::fmt::Display for BetaError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
BetaError::ShapeAInvalid => write!(f, "Shape A must be positive finite"),
BetaError::ShapeBInvalid => write!(f, "Shape B must be positive finite"),
}
}
}
It will be a little boilerplatey, but it need not be flexible, sounds like a text editor problem.
Alright, I'll prepare another draft pr to compare against the rest if you don't mind
Thanks for this, I'll keep an eye out.
enum StatsError
is supposed to be anThis indicates that it should not try to be a generic error type for any kind of statistics calculations, but instead only concern itself with the errors produced in
statrs
.With that basic assumption, there are currently some inconsistencies and outdated practices though (API guidelines for reference):
StatsError
implements bothSync
andSend
(this is more of a formality, but is trivial to implement and good future-proofing)Error::description
is deprecated and should not be implementedArgNotNegative
,ArgIntervalExclMin
, etc.), these are leftovers from older versions and should be removed because they're no longer needed bystatrs
.Result<T>
that cannot return an error:Empirical::new
. There's no real reason for this infallible API to return aResult<T, E>
. There might be others.I realize that most of these are breaking changes, but seeing that the crate is pre-1.0, I don't think there's a big problem doing this.
Other things that could be improved:
StatsError
is big: 40 bytes on a Linux x64 target. This is because there are variants which contain 2x&str
(2 x 16 = 32 bytes plus discriminant and padding). Is it really necessary to have strings in the error type? The implementation could be replaced mostly 1:1 with some sort ofArgName
enum, but there might be an even better solution that does not need this argument name wrapping at all.All
new
functions seem to just returnStatsError::BadParams
whenever the params are outside of the defined/allowed range. Is there a good reason for these to be so vague when compared to the more specific errors returned by other functions? After all, the more specific errors already exist, why not return more exact error information? There might even be value in providing multiple error types, to have errors that are more specific to the exact API being used.