lox-space / lox

Oxidized Astrodynamics
Mozilla Public License 2.0
12 stars 1 forks source link

RFC: Unify error handling strategies #90

Open AngusGMorrison opened 5 months ago

AngusGMorrison commented 5 months ago

Lox currently takes several approaches to error handling, which is confusing for authors and users. This proposal aims to establish a consistent approach to error handling throughout the library.

Current situtation

Lox currently employs three different error handling strategies:

Deciding which to implement for a new error scenario leaves too much room for personal preference, and results in an inconsistent experience for library consumers.

List of Lox errors

Since it's useful to understand the state of play, and because I'll be referring back to it later, here are the definitions of all errors currently used in the Lox workspace:

#[derive(Clone, Debug, Error, Eq, PartialEq)]  
pub enum LoxCubicSplineError {  
    #[error("`x` and `y` must have the same length but were {0} and {1}")]  
    DimensionMismatch(usize, usize),  
    #[error("length of `x` and `y` must at least 4 but was {0}")]  
    InsufficientPoints(usize),  
    #[error("linear system could not be solved")]  
    Unsolvable,  
}
#[derive(Error, Debug, PartialEq)]  
pub enum LoxBodiesError {  
    #[error("unknown body `{0}`")]  
    UnknownBody(String),  
}
  #[derive(Clone, Debug, Error, PartialEq)]  
pub enum LoxEopError {  
    #[error("{0}")]  
    Csv(String),  
    #[error("{0}")]  
    Io(String),  
}
#[derive(Clone, Copy, Debug, Error, PartialEq)]  
#[error("sizes of `x`, `y`, `t` and `epochs` must match, but were x: {nx}, y: {ny}, t: {nt}, epochs: {nepochs}")]  
pub struct ArgumentSizeMismatchError {  
    nx: usize,  
    ny: usize,  
    nt: usize,  
    nepochs: usize,  
}
#[derive(Debug, PartialEq)]  
pub enum DafSpkError {  
    // The data type integer value does not match the ones in the spec  
    InvalidSpkSegmentDataType,  
    // The number of DAF components does not match the SPK specification  
    UnexpectedNumberOfComponents,  
    UnableToParse,  
    UnsupportedSpkArrayType { data_type: i32 },  
    // Unable to find the segment for a given center body and target body  
    UnableToFindMatchingSegment,  
    // Unable to find record for a given date  
    UnableToFindMatchingRecord,  
}
#[derive(Error, Debug)]  
pub enum LoxPyError {  
    #[error("unknown time scale `{0}`")]  
    InvalidTimeScale(String),  
    #[error("unknown body `{0}`")]  
    InvalidBody(String),  
    #[error("unknown frame `{0}`")]  
    InvalidFrame(String),  
    #[error(transparent)]  
    LoxBodiesError(#[from] LoxBodiesError),  
    #[error(transparent)]  
    LoxTimeError(#[from] LoxTimeError),  
    #[error(transparent)]  
    PyError(#[from] PyErr),  
}
#[derive(Clone, Debug, Default, Error)]  
#[error("`{raw}` cannot be represented as a `TimeDelta`: {detail}")]  
pub struct TimeDeltaError {  
    pub raw: f64,  
    pub detail: String,  
}
#[derive(Clone, Error, Debug)]  
pub enum LoxTimeError {  
    #[error("invalid date `{0}-{1}-{2}`")]  
    InvalidDate(i64, i64, i64),  
    #[error("invalid time `{0}:{1}:{2}`")]  
    InvalidTime(u8, u8, u8),  
    #[error("seconds must be in the range [0.0, 60.0], but was `{0}`")]  
    InvalidSeconds(f64),  
    #[error("subsecond must be in the range [0.0, 1.0), but was `{0}`")]  
    InvalidSubsecond(f64),  
    #[error("day of year cannot be 366 for a non-leap year")]  
    NonLeapYear,  
}
#[derive(Debug, Error)]  
pub enum CoefficientsError {  
    #[error("kernel coefficients with key {key} had size {actual}, expected at least {actual}")]  
    TooFew {  
        key: String,  
        min: usize,  
        actual: usize,  
    },  
    #[error("kernel coefficients with key {key} had size {actual}, expected at most {actual}")]  
    TooMany {  
        key: String,  
        max: usize,  
        actual: usize,  
    },  
    #[error(  
        "barycenter nutation precession coefficients with key {key} had an odd number of \        terms, but an even number is required"    )]    OddNumber { key: String },  
}
#[derive(Clone, Copy, Debug, Error, PartialEq)]  
#[error("UTC is not defined for dates before 1960-01-01")]  
/// UTC is not defined for dates before 1960-01-01. Attempting to create a `UtcDateTime` with such/// a date results in this error.  
pub struct UtcUndefinedError;

The trouble with crate-level error enums

Our crate-level error enums have two, closely related issues.

Irrelevant variants

First, most of their variants are irrelevant to the operations that return them. For example, Subsecond::new returns Result<Subsecond, LoxTimeError>. LoxTimeError is a five-variant enum, but the only variant relevant to Subsecond::new is LoxTimeError::InvalidSubsecond.

To both library users and authors, LoxTimeError implies that an operation may return any of the possible variants, and forces the use of a match statement to distinguish between variants where there may only be one returned variant in practice. Both readability and usability are reduced.

Breaking changes

Second, adding or removing variants from public enums is a breaking change. For example, by removing the NonLeapYear variant from LoxTimeError, we would risk breaking error handling code for unrelated errors, such as InvalidSubsecond. This simple change would require a major version bump.

What good enum errors look like

LoxCubicSplineError is an example of an error enum done well. It has three variants, and all three of these variants may occur in a call to CubicSpline::new. In other words, it is a tight upper and lower bound on the possible error scenarios of a particular operation. Its variants are unlikely to change, and it doesn't attempt to encapsulate the errors of unrelated operations.

Proposed solution

How will Lox users handle errors?

Considering the list of Lox errors, they overwhelmingly relate to invalid inputs. This tells us that Lox errors are highly unlikely to be handled programmatically. If the input is wrong, it requires human attention to fix. Therefore, the universal error strategy will be to log the error and exit.

This tells us several things:

Why not Box<dyn Error + Send + Sync>?

At runtime, Lox users are unlikely to need any further detail than a dyn Error is able to provide. However, well typed errors offer significant benefits to code authors and readers. Through the type system, they reveal the complete set of error scenarios for a given operation at a glance. For example, reading that Utc::try_from::<Time<Tai>> returns only UtcUndefinedError is more powerfully descriptive than either Box<dyn Error + Sync + Send> or LoxTimeError.

Struct and tuple error types additionally assist library authors in providing meaningful, standardized error messages. Constructing ad hoc errors from string literals is less likely to include the full wealth of data a consumer could use to debug their inputs, and is more prone to copy-pasting.

More granular errors are better for authors and readers, with no impact on consumers

For Lox, the ideal error is the narrowest type that fully represents all error scenarios that may occur during a function call.

Examples

Good

#[derive(Clone, Debug, Error, Eq, PartialEq)]  
pub enum LoxCubicSplineError {  
    #[error("`x` and `y` must have the same length but were {0} and {1}")]  
    DimensionMismatch(usize, usize),  
    #[error("length of `x` and `y` must at least 4 but was {0}")]  
    InsufficientPoints(usize),  
    #[error("linear system could not be solved")]  
    Unsolvable,  
}

LoxCubicSplineError is the narrowest type that represents everything that could go wrong when instantiating a CubicSpline. It contains no unnecessary variants and gives the users the debugging information they need to resolve the issue.

#[derive(Clone, Copy, Debug, Error, PartialEq)]  
#[error("sizes of `x`, `y`, `t` and `epochs` must match, but were x: {nx}, y: {ny}, t: {nt}, epochs: {nepochs}")]  
pub struct ArgumentSizeMismatchError {  
    nx: usize,  
    ny: usize,  
    nt: usize,  
    nepochs: usize,  
}

ArgumentSizeMismatchError is a struct error representing the only thing that can go wrong for a given operation, and provides the user with the information they need to find the source of the problem.

Bad

#[derive(Clone, Error, Debug)]  
pub enum LoxTimeError {  
    #[error("invalid date `{0}-{1}-{2}`")]  
    InvalidDate(i64, i64, i64),  
    #[error("invalid time `{0}:{1}:{2}`")]  
    InvalidTime(u8, u8, u8),  
    #[error("seconds must be in the range [0.0, 60.0], but was `{0}`")]  
    InvalidSeconds(f64),  
    #[error("subsecond must be in the range [0.0, 1.0), but was `{0}`")]  
    InvalidSubsecond(f64),  
    #[error("day of year cannot be 366 for a non-leap year")]  
    NonLeapYear,  
}

LoxTimeError encapsulates error variants for many unrelated operations.

#[derive(Clone, Copy, Debug, Error, PartialEq)]  
#[error("UTC is not defined for dates before 1960-01-01")]  
/// UTC is not defined for dates before 1960-01-01. Attempting to create a `UtcDateTime` with such/// a date results in this error.  
pub struct UtcUndefinedError;

UtcUndefinedError does well to represent a single error scenario, but could be improved by including the invalid date in its error message to aid debugging.

Drop the Lox prefix

Finally, the Lox prefix for error types was useful when needed as a disambiguation from, e.g. std::io::Error, but is unnecessary when errors are specific to the operations that return them.

helgee commented 5 months ago

I agree with the general strategy and the recommendations. I would only add that application developers might want to use a crate like anyhow or color-eyre to display errors to users.

Could you add this as a short dev doc to the repo (e.g. docs/error_handling.md)?

matzipan commented 5 months ago

Note, DafSpkError doesn't actually implement Error.

Ooops. Thanks for spotting this. I wonder if there's a linter check that we could enable.

matzipan commented 5 months ago

LGTM.