Document error detection/handling philosophy

yupferris commented 4 years ago

kaze can only detect certain kinds of errors at certain times. Some examples:

We can only combine Signals with other Signals. This is expressed in rust's type system, and is thus detectable/reported during compilation of the rust kaze code.
We can only combine Signals with certain bit_widths with other Signals of the same bit_width for certain operations, eg. x & y. Since a Signal's bit_width cannot be described with rust's type system (yet), but checking for this error is trivial, we detect and report this error during execution of the rust kaze code as the graph is being constructed (as part of the & operator impl in this case).
We can't generate correct code if a Module instance has unconnected inputs, but we must allow unconnected inputs during graph creation and expect them to be resolved before code is generated. In this case, we defer error checking/reporting until codegen time.

Further, a somewhat unsatisfying decision that I've made is that certain classes of errors are reported in different ways. Obviously errors that are checked by rust's type checker are reported as rust type errors. However, other types of errors are reported as panics. The reasoning behind this is to naturally be able to provide where in the rust code the error occurred (via the stack trace), similar to using a non-embedded language. This choice also leads to more readable user/API code, since the types aren't conflated with error handling details that would have to be used on every Signal parameter and return type. It's somewhat indirect though admittedly from a user's perspective, since a stack trace must be used to obtain this error information, as rust doesn't provide any way to get information about a function's caller out-of-the-box (this could potentially be remedied by having macro wrappers for all API entry points that expand to larger calls with additional file/line info but this also gets messy for existing Op impls which don't allow the API to be extended). This gets a bit messy for codegen as well, where it's natural for the code generator APIs to return a Result (which they actually do currently), but we currently still report those errors as panics and only use Result to report IO errors. This is arguably consistent with other runtime errors (which also panic) and this is why I made this choice, but it's also perfectly reasonable to say that this is inconsistent with its own API which returns an io::Result (though I chose io::Result specifically to communicate that only io-related errors are produced this way).

One thing we could possibly explore is to use Result<Signal<..>> for all API entry points instead of Signal directly and have all ops propagate errors in addition to the logic they already do. This would mean that this kind of propagation also needs to be tested for all ops and I think the code would generally be a lot less readable; perhaps there's a good pattern for type aliasing that would help in this regard. This is especially important for higher-order user code that can be used to generate constructs more abstractly which usually already carries additional mental weight, and shouldn't be further complicated by additional type information.

Generally this is kindof unsatisfying and probably surprising so at the very least, it should be documented as part of how to use kaze generally and other strategies may need to be investigated for future library versions.

yupferris commented 4 years ago

Note that a Rust feature I was unaware of until now is track_caller, which might be useful for providing better error messages by providing information about the calling function.

A quick test:

use std::panic::Location;

#[track_caller]
fn dude(x: i32) -> i32 {
    let caller = Location::caller();
    println!("file: {}, line: {}, column: {}", caller.file(), caller.line(), caller.column());
    x * 2
}

fn main() {
    let x = dude(56);
    println!("Hello, world! {}", x);
}

So it seems this is useful even outside of a panic context. It's worth exploring this as a way to provide better errors on failure without requiring RUST_BACKTRACE to be set, especially as it's stable as of Rust 1.46.0.

Note also that this could potentially allow us to defer all error checking during graph creation, and then have all error(s) reported at compile/lowering time, perhaps checked by the validation pass that's already in place (but would be extended). I think this makes certain classes of errors a bit harder to check (we need to traverse more of the graph than just the reachable parts like we do currently, which might mean keeping more internal references or something), but it would also mean eliminating all panics from graph creation, which could potentially be really nice. We could also only check for errors in the reachable part of the graph, but I would argue this is generally undesirable, as code that creates unreachable parts of the graph would rot instead of being checked (even though it's ignored during further compilation passes).

jdonszelmann commented 3 years ago

I thought about this for a while (at first I was confused why there were no results) but after a while I started to really like it. The thing is, this is not a normal library. It's more like a framework. This library does not need to report errors to other libraries or code, it just needs to notify a user. Panics are fine for that I think. Caller tracking looks promising though to indeed provide better error reporting.

yupferris commented 3 years ago

Indeed, the usage of the library is much more like using a compiler (that you bootstrap yourself via build.rs usually) than a typical library.

yupferris / kaze

Document error detection/handling philosophy #9