indigits / scirust

Scientific Computing Library in Rust
Apache License 2.0
265 stars 29 forks source link

RFC: a convention for error handling #25

Open shailesh1729 opened 9 years ago

shailesh1729 commented 9 years ago

This RFC proposes a convention for structuring methods in SciRust which can cater to the conflicting needs of efficiency, easy of use and effective error handling.

For the impatient:

// Efficient access without bound checks
unsafe fn get_unchecked(&self, r : usize, c : usize) -> T;
// Safe access with bound checks, raises error if invalid address
fn get_checked(&self, r : usize, c : usize) -> Result<T, Error>;
// User friendly version. Panics in case of error
fn get(&self, r : usize, c : usize) -> T;

// Efficient modification without bound checks
unsafe fn set_unchecked(&mut self, r : usize, c : usize, value : T);

// Safe modification with bound check
fn set(&mut self, r : usize, c : usize, value : T);

Detailed discussion

The audience of SciRust can be possibly divided into two usage scenarios.

While the first usage scenario is important for getting new users hooked to the library, the second usage scenario is also important for justifying why Rust should be used for scientific software development compared to other scientific computing platforms.

In context of the two usage scenarios, the design of SciRust has three conflicting goals:

While ease of use is important for script style usage, efficiency and well managed error handling are important for serious software development on top of core components provided by SciRust.

We will consider the example of a get(r,c) method on a matrix object to discuss these conflicting goals. Please note that get is just a representative method for this discussion. The design ideas can be applied in many different parts of SciRust once accepted.

If get is being called in a loop, usually the code around it can ensure that the conditions for accessing data within the boundary of the matrix are met correctly. Thus, a bound checking within the implementation of get is just an extra overhead.

While this design is good for writing efficient software, it can lead to a number of memory related bugs and goes against the fundamental philosophy of Rust (Safety first). There are actually two different options for error handling:

Option<T> or Result<T, Error> provides the users a fine grained control over what to do when an error occurs. This is certainly the Rusty way of doing things. At the same time, both of these return types make the user code more complicated. One has to add extra calls to .unwrap() even if one is sure that the function is not going to fail.

Users of scientific computing tend to prefer an environment where they can get more work done with less effort. This is a reason of the success of specialized environments like MATLAB. Open source environments like Python (NumPy, SciPy) try to achieve something similar.

While SciRust doesn't intend to compete at the level of simplicity provided by MATLAB/Python environments, it does intend to take an extra effort wherever possible to address the ease of use goal.
In this context, the return type of a getter should be just the value type T. This can be achieved safely by using a panic if the access boundary conditions are not met.

The discussion above suggests up to 3 possible ways of implementing methods like get.

We propose that a method for which these variations need to be supported, should follow the convention defined below:

First two versions are suitable for professional development where most of the time we need a safe API while at some times we need an unsafe API for efficient implementation. The third version is suitable for script style usage scenario.

The convention has been illustrated in the three versions of get at the beginning of this document.

API bloat

While this convention is expected to lead into an API bloat, but if the convention is followed properly across the library, then it should be easy to follow (both from the perspective of users of the library and from the perspective of developers of the library).

Sean1708 commented 9 years ago

Just food for thought, what if the safety was baked into the type? E.g. have a Matrix<T> type and an UncheckedMatrix<T> type. That way the API would be identical across the board, you would just choose which one you need, and since the data layout would be identical converting between the two shouldn't be too expensive.

Sean1708 commented 9 years ago

Thinking a bit further into it, you could still have the three methods by doing something along the lines of

impl Checked for Matrix {
    fn get(...) {
        // ...
    }
}

impl Unchecked for UncheckedMatrix {
    fn get(...) {
        // ...
    }
}

fn get<T: Checked>(m: T, c: usize, r: usize) -> ... {
    match m.get(c, r) {
        Ok(v) => v,
        Err(_) => panic!(),
    }
}

Obviously that's only a rough thought but I think it might make a much cleaner API.

shailesh1729 commented 9 years ago

Does this mean that a higher level function would either work with a safe matrix or an unsafe matrix only? I was thinking in the lines of linear algebra code where most of the time, one would use the safe matrix API (like elementary row/column operations), while occasionally getting dirty with the unsafe API for efficiency purposes.

daniel-vainsencher commented 9 years ago

Hi, I also am not sure how I would use the matrix type based API. Seems like the matrices would not be easy to replace, having different apis. On Aug 11, 2015 7:02 AM, "Shailesh Kumar" notifications@github.com wrote:

Does this mean that a higher level function would either work with a safe matrix or an unsafe matrix only? I was thinking in the lines of linear algebra code where most of the time, one would use the safe matrix API (like elementary row/column operations), while occasionally getting dirty with the unsafe API for efficiency purposes.

— Reply to this email directly or view it on GitHub https://github.com/indigits/scirust/issues/25#issuecomment-129835684.

shailesh1729 commented 9 years ago

Rust slices have both get_unchecked() and get() methods. https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked . The proposed design is not very different. But rather than having 2 versions of get, here 3 versions are being suggested. The panic version is meant primarily for making writing scripty code easier.

daniel-vainsencher commented 9 years ago

TL;DR:

Reasoning:

get_checked: indices are input from untrusted source, like file or a user. There is no reason to expect them to be correct, and we need the program to do something sophisticated (like gve the user another chance to input) if they are wrong. This is mainly useful in programs (where we know what to do if indices are wrong). In libraries, if your using program gives wrong indices, all you need is to help find the bug by giving a clear message and a stack trace.

get: every other case. So I expect my indices to be correct, but I want them checked just in case. Wrong indices indicate a bug somewhere, hence panic is appropriate.

get_unchecked: this is only for optimization purposes, so should be used only when some profiling shows that bounds checks are a problem. The "unsafe" property should be considered a burden on the provider of indices to prove (informally, of course) that they are within bounds, and thus avoided whenever possible.

About other functions: get and set are potentially cheap enough that bounds checking overhead might be significant, and some access patterns are obvious enough safe that these per-element bounds checks should be avoided. For example, a column iterator will access contiguous memory, hence be very cheap, and it is easy to ensure the start and end points are correct. On the other hand, the column iterator function itself probably doesn't need a "get_unchecked" version, simply because the relative cost of checks to work is nothing (unless used a lot to access 1-row matrices, in which case the user should find a more appropriate implementation).

I've had a different idea btw. We could associate to a matrix two new types: a checked_row_index and a checked_column_index. Then create versions of get that accept those types without checking. Then create values of those types only through known safe sources: col_index_iter, check_col_index, first_col_index etc. Then we can use optimally fast code, but move most of the burden of proof to the type system. This depends on some details about the type system that I don't know, and I'm not sure whether the pattern generalizes, but it tries to follow "Make illegal states unrepresentable" [2]. For example, we probably want safe_indices to borrow the shape of the matrix, to prevent it from changing shape (don't know if we can borrow a field).

Daniel [1] https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming) [2] https://blogs.janestreet.com/effective-ml-revisited/

On Aug 10, 2015 4:46 PM, "Shailesh Kumar" notifications@github.com wrote:

This RFC proposes a convention for structuring methods in SciRust which can cater to the conflicting needs of efficiency, easy of use and effective error handling.

For the impatient:

// Efficient access without bound checks unsafe fn get_unchecked(&self, r : usize, c : usize) -> T; // Safe access with bound checks, raises error if invalid address fn get_checked(&self, r : usize, c : usize) -> Result<T, Error>; // User friendly version. Panics in case of error fn get(&self, r : usize, c : usize) -> T;

// Efficient modification without bound checks unsafe fn set_unchecked(&mut self, r : usize, c : usize, value : T);

// Safe modification with bound check fn set(&mut self, r : usize, c : usize, value : T);

Detailed discussion

The audience of SciRust can be possibly divided into two usage scenarios.

  • A script style usage, where the objective is to quickly do some numerical experiment, get the results and analyze them.
  • A library development usage, where more professional libraries would be built on top of fundamental building blocks provided by SciRust (these may be other modules shipped in SciRust itself).

While the first usage scenario is important for getting new users hooked to the library, the second usage scenario is also important for justifying why Rust should be used for scientific software development compared to other scientific computing platforms.

In context of the two usage scenarios, the design of SciRust has three conflicting goals:

  • Ease of use
  • Efficiency
  • Well managed error handling

While ease of use is important for script style usage, efficiency and well managed error handling are important for serious software development on top of core components provided by SciRust.

We will consider the example of a get(r,c) method on a matrix object to discuss these conflicting goals. Please note that get is just a representative method for this discussion. The design ideas can be applied in many different parts of SciRust once accepted.

If get is being called in a loop, usually the code around it can ensure that the conditions for accessing data within the boundary of the matrix are met correctly. Thus, a bound checking within the implementation of get is just an extra overhead.

While this design is good for writing efficient software, it can lead to a number of memory related bugs and goes against the fundamental philosophy of Rust (Safety first). There are actually two different options for error handling:

  • Returning either Option or Result<T, Error>.
  • Using the panic mechanism.

Option or Result<T, Error> provides the users a fine grained control over what to do when an error occurs. This is certainly the Rusty way of doing things. At the same time, both of these return types make the user code more complicated. One has to add extra calls to .unwrap() even if one is sure that the function is not going to fail.

Users of scientific computing tend to prefer an environment where they can get more work done with less effort. This is a reason of the success of specialized environments like MATLAB. Open source environments like Python (NumPy, SciPy) try to achieve something similar.

While SciRust doesn't intend to compete at the level of simplicity provided by MATLAB/Python environments, it does intend to take an extra effort wherever possible to address the ease of use goal.

In this context, the return type of a getter should be just the value type T. This can be achieved safely by using a panic if the access boundary conditions are not met.

The discussion above suggests up to 3 possible ways of implementing methods like get.

-

An unchecked (and unsafe) version for high efficiency code where the calling code is responsible for ensuring that the necessary requirements for correct execution of the method are being met.

A safe version which returns either Option or Result<T, Error> which can be used for professional software development where the calling code has full control over error handling.

Another safe version which panics in case of error but provides an API which is simpler to use for writing short scientific computing scripts.

Proposed convention

We propose that a method for which these variations need to be supported, should follow the convention defined below:

-

A method_unchecked version should provide basic implementation of the method. This should assume that necessary conditions for successful execution of the methods are already being ensured by the calling code. The unchecked version of method MUST be marked unsafe. This ensures that the calling code knows that it is responsible for ensuring the right conditions for calling the unchecked method.

A method_checked version should be implemented on top of a method_unchecked method. The checked version should check for all the requirements for calling the method safely. The return type should be either Option or Result<T, Error>. In case the required conditions for calling the method are not met, a None or Error should be returned. Once the required conditions are met, method_unchecked should be called to get the result which would be wrapped inside Option or Result.

A method version should be built on top of method_checked version. It should simply attempt to unwrap the value returned by method_checked and return as T. If method_checked returns an error or None, this version should panic.

First two versions are suitable for professional development where most of the time we need a safe API while at some times we need an unsafe API for efficient implementation. The third version is suitable for script style usage scenario.

The convention has been illustrated in the three versions of get at the beginning of this document. API bloat

While this convention is expected to lead into an API bloat, but if the convention is followed properly across the library, then it should be easy to follow (both from the perspective of users of the library and from the perspective of developers of the library).

— Reply to this email directly or view it on GitHub https://github.com/indigits/scirust/issues/25.

Sean1708 commented 9 years ago

Seems like the matrices would not be easy to replace, having different apis.

I was actually thinking the opposite, that the two types would have exactly the same API and only the implementations would differ. Consider the following workflow:

// let's say, for example, that matrix_from_file() is guaranteed to return a 20x20 matrix or larger
let m: Matrix<f64> = matrix_from_file("filename");

// this element is not guaranteed to be there
if let Ok(element) = m.get(50, 50) {
    println!("m[50, 50] = {}", element);
}

// we can guarantee that this is in bounds so we start using UncheckedMatrix
let m = m as UncheckedMatrix;
let mut i = 0.0;
for j in 0..20 {
    i += m.get(j, j);
}

I think it would make a cleaner API but actually the more of this I write out the more error prone it seems for the user. I just always feel like the type system isn't being used properly when I have to write things like get_checked rather than just get.

daniel-vainsencher commented 9 years ago

let Ok(element) = m.get(50, 50) i += m.get(j, j);

These two functions called get do not have the same return type, hence the APIs are not compatible. This is what I meant by not easy to replace.

On Tue, Aug 11, 2015 at 2:04 PM, Sean Marshallsay notifications@github.com wrote:

Seems like the matrices would not be easy to replace, having different apis.

I was actually thinking the opposite, that the two types would have exactly the same API and only the implementations would differ. Consider the following workflow:

// let's say, for example, that matrix_from_file() is guaranteed to return a 20x20 matrix or largerlet m: Matrix = matrix_from_file("filename"); // this element is not guaranteed to be thereif let Ok(element) = m.get(50, 50) { println!("m[50, 50] = {}", element); } // we can guarantee that this is in bounds so we start using UncheckedMatrixlet m = m as UncheckedMatrix;let mut i = 0.0;for j in 0..20 { i += m.get(j, j); }

I think it would make a cleaner API but actually the more of this I write out the more error prone it seems for the user. I just always feel like the type system isn't being used properly when I have to write things like get_checked rather than just get.

— Reply to this email directly or view it on GitHub https://github.com/indigits/scirust/issues/25#issuecomment-129996412.

Daniel Vainsencher

Sean1708 commented 9 years ago

Of course. Sorry, I completely missed that (it's been a long week already).

daniel-vainsencher commented 9 years ago

No problem. I wonder if its possible to flesh out the shape-borrowing safe-by-type indices I proposed above. Something like

get<C: SafeColIndex, R: SafeRowIndex>(col: C, row: R) -> ... (doesn't need checking) and get<C, R>(col: C, row: R) -> ... (checks and panics if needed)

On Tue, Aug 11, 2015 at 3:52 PM, Sean Marshallsay notifications@github.com wrote:

Of course. Sorry, I completely missed that (it's been a long week already).

— Reply to this email directly or view it on GitHub https://github.com/indigits/scirust/issues/25#issuecomment-130043565.

Daniel Vainsencher

daniel-vainsencher commented 9 years ago

Sure, but what I would love is if we can arrange that users cannot create them! users can only receive them from the Matrix, which actually verifies that they actually are correct.

Something like

let c = m.min_col() // has type SafeColIndex, so can be used in unchecked indexing. In Haskell it is possible to not export the constructors for types, and then library users cannot (easily) create invalid instances, and you can enforce various invariants. Here we would want each such index to borrow a reference for the "shape" field of the matrix, so that shape cannot change while safe indices exist.

On Tue, Aug 11, 2015 at 4:52 PM, Sean Marshallsay notifications@github.com wrote:

That's a good idea, maybe it could be done as a newtype, something like

struct Safe(usize); fn get(self, c: Safe, r: Safe) -> ...

Then you could call it like

let el = mat.get(Safe(3), Safe(4));

— Reply to this email directly or view it on GitHub https://github.com/indigits/scirust/issues/25#issuecomment-130068829.

Daniel Vainsencher

shailesh1729 commented 9 years ago

Just to clarify on one point. The RFC doesn't intend to suggest that three versions of a method should be considered for each method.

Typical development would go like this.

So in summary:

The objective of this RFC is to develop a guideline for the development process.

daniel-vainsencher commented 9 years ago

So looks like we agree that the unsafe version should be a rarity.

I think that if most methods should be optiony/resulty, we should strive for Results with clearly explained error types, to enable users to do something smarter than unwrap.

For example, I would make the get have "column/row out of bounds" error. What would a resulty matrix inversion return as error types? or dot product? What is some concrete code where you would handle these Results, and not by panic! ?

Daniel

On Tue, Aug 11, 2015 at 8:01 PM, Shailesh Kumar notifications@github.com wrote:

Just to clarify on one point. The RFC doesn't intend to suggest that three versions of a method should be considered for each method.

Typical development would go like this.

-

Start with a single method like method_name() returning Option or Result<T, Error>. This is a standard Rust style approach. panic() should be avoided in basic development.

If using the method is becoming cumbersome for writing straight-forward scripts, then we need two versions. One version as the above and the other version which panics. The panicky version has a return type of T. The panicky version in turn uses the implementation of the version returning Option or Result<T, Error>. The preference would be to use the method_name() for the panicky version. The question here is how the method should be named for resulty/optiony version. The RFC above suggests the name to be method_name_checked.

In very rare circumstances, for efficiency purposes, we need an unsafe version of the method also which skips some of the parameter validation. This version would be named as method_name_unchecked. In this case, the optiony / resulty version would build on top of it by adding extra parameter validation.

So in summary:

  • all methods should by default be optiony or resulty.
  • if optiony / resulty version is difficult to use for script style usage, then introduce a panicky version which in turn uses the optiony / resulty version.
  • In extremely rare circumstances an unchecked version is also introduced for efficiency purposes. This is done by refactoring the implementation from the optiony / resulty version.

The objective of this RFC is to develop a guideline for the development process.

— Reply to this email directly or view it on GitHub https://github.com/indigits/scirust/issues/25#issuecomment-130115954.

Daniel Vainsencher

shailesh1729 commented 9 years ago

Please look at src/error.rs type SRError for the current list of error codes. This certainly would be extended further. There is also a to_string method available for the SRError type which presents a human readable string for it. It is expected that some of these error codes will have additional parameters also attached with them in due course. For their usage, just search on SRError in the code base. You will see plenty of examples.

daniel-vainsencher commented 9 years ago

Ah, looks pretty good! Already had the pleasure to encounter DimensionMismatch, so I changed it to also store and print the dimensions of the participating matrices, since this makes it much easier to find the originating bug. I'll send a PR with that.

On Wed, Aug 12, 2015 at 10:55 PM, Shailesh Kumar notifications@github.com wrote:

Please look at src/error.rs type SRError for the current list of error codes. This certainly would be extended further. There is also a to_string method available for the SRError type which presents a human readable string for it. It is expected that some of these error codes will have additional parameters also attached with them in due course. For their usage, just search on SRError in the code base. You will see plenty of examples.

— Reply to this email directly or view it on GitHub https://github.com/indigits/scirust/issues/25#issuecomment-130512755.

Daniel Vainsencher