Closed Sicheng-Pan closed 11 months ago
Looks good !
How about classed errors? I guess robj_to!
and all conversion functions should also support some classed errors . What would be a good error type internally for r-polars for e.g. try_f64_intousize() -> Result<, ?>?
We can try to make some conversion wrappers for the extendr error types to support send+sync.
rust-polars used to define their own Result type Result<T, E = PolarsError>
however they renamed it to PolarsResult
to avoid namespace collisions with all the other crates, that also liked to define what Result
should mean, e.g. extendr Result<T, E= ExtendrError>
. When working in namespaces with multiple custom Result
-types for e.g. rust-polars, extendr and r-polars it is annoying to rename the Result
to avoid conflict and remember what Result
currently meeans. I prefer if we could name our new candidate Result-type based on anyhow to something like RResult
or use plain
std::result::Result;
Result<T, Rerr> // not too verbose anyways
this will add auto completion to Rerr methods. Although Rerr is not public, it is nice to develop with.
#' @title auto complete $-access into a polars object
#' @description called by the interactive R session internally
#' @param x Rerr
#' @param pattern code-stump as string to auto-complete
#' @export
#' @keywords internal
.DollarNames.Rerr = function(x, pattern = "") {
get_method_usages(Rerr, pattern = pattern)
}
It seems Extendr errors are not send+sync because they can contain Robj
which is only partially send+sync, see ParRobj
(but that is playing with fire :) )
I think the Robj
can be converted to a string representation, because they are not used for anything but for display anyways.
Looks good !
How about classed errors? I guess
robj_to!
and all conversion functions should also support some classed errors . What would be a good error type internally for r-polars for e.g. try_f64_intousize() -> Result<, ?>?We can try to make some conversion wrappers for the extendr error types to support send+sync.
rust-polars used to define their own Result type
Result<T, E = PolarsError>
however they renamed it toPolarsResult
to avoid namespace collisions with all the other crates, that also liked to define whatResult
should mean, e.g. extendrResult<T, E= ExtendrError>
. When working in namespaces with multiple customResult
-types for e.g. rust-polars, extendr and r-polars it is annoying to rename theResult
to avoid conflict and remember whatResult
currently meeans. I prefer if we could name our new candidate Result-type based on anyhow to something likeRResult
or use plainstd::result::Result; Result<T, Rerr> // not too verbose anyways
For internal error type I would recommend anyhow::Error
. If we would like to know the concrete type of error within anyhow::Error
, we could try to downcast it to its original form in the implementation of Rerr
.
I would also recommend using classed context (e.g. context enum) to achieve classed errors. The base error will be string error messages created from anyhow!()
or errors from other libraries.
I've renamed our Result type to RResult
for clarity.
It appears anyhow is convenient but not to handy to match on Errors.
If we say the final goal of error handling is to produce classed R error conditions something like this perhaps
> structure(list(message="cannot divide by zero", call =NULL), class = c("ComputeError","rust_polars","condition"))
<ComputeError: cannot divide by zero>
Here a unit test can check easily what type of error was raised
how would we convert the anyhow errors into such a classed error?
... or could Rerr have a method which can return a char vec like class=
above? That would be likely be fine also.
If using anyhow error shouldn't r-polars also use an enum as base error to store the class/type information?
It appears anyhow is convenient but not to handy to match on Errors.
If we say the final goal of error handling is to produce classed R error conditions something like this perhaps
> structure(list(message="cannot divide by zero", call =NULL), class = c("ComputeError","rust_polars","condition")) <ComputeError: cannot divide by zero>
Here a unit test can check easily what type of error was raised
how would we convert the anyhow errors into such a classed error?
... or could Rerr have a method which can return a char vec like
class=
above? That would be likely be fine also.If using anyhow error shouldn't r-polars also use an enum as base error to store the class/type information?
Well I think that could be achieved with the classed context. For example if you have a context enum with various error tags. For example, if we have the following enum:
enum ErrTag {
ComputeError,
Polars,
}
impl Display for ErrTag {
...
}
then we can write the following code:
err.context(ErrTag::ComputeError).context(ErrTag::Polars)
which will gives us the error with the desired context. Currently I exports the anyhow::Error::chain
function for Rerr, and on the R side we can use
result$err$chain()
to retrieve the layers of contexts for the error. Although I have not implemented error tag yet, you can try it out with String contexts. For example:
import_arrow_ipc("wrong.file", NULL, TRUE, TRUE, NULL, 0L, TRUE)$err$chain()
This will produce a vector containing the error context and the error itself.
https://www.lpalmieri.com/posts/error-handling-rust/
I was reading this blog on error basics and thiserror and anyhow. It struck me that anyhow also requires besides send+sync also static. That means we can only use static strings. Maybe that would mean a big rework of the help conversion messages, and no String
-types used in a transition phase. Maybe it will also be difficult to represent the user input which was wrong.
example
> pl$lit("hey you")$str$split(" ")$to_r()
[[1]]
[1] "hey" "you"
> pl$lit("hey you")$str$split(42)$to_r()
Error: in str$split: the arg [by] is not a single string, but 42.0
when calling :
pl$lit("hey you")$str$split(42)
I have tinkered with representation function for extendr errors to make them send+sync.
I was thinking to provide also some simple test cases in that branch.
https://www.lpalmieri.com/posts/error-handling-rust/
I was reading this blog on error basics and thiserror and anyhow. It struck me that anyhow also requires besides send+sync also static. That means we can only use static strings. Maybe that would mean a big rework of the help conversion messages, and no
String
-types used in a transition phase. Maybe it will also be difficult to represent the user input which was wrong.example
> pl$lit("hey you")$str$split(" ")$to_r() [[1]] [1] "hey" "you" > pl$lit("hey you")$str$split(42)$to_r() Error: in str$split: the arg [by] is not a single string, but 42.0 when calling : pl$lit("hey you")$str$split(42)
I have tinkered with representation function for extendr errors to make them send+sync.
I was thinking to provide also some simple test cases in that branch.
Well I think the anyhow!
macro could take String
as inputs and creates the corresponding anyhow::Error
.
Based on your representation, I have now implemented
impl From<extendr_api::Error> for Rerr {
...
}
so that we can reuse the extendr error enum for our purpose.
@sorhawell I've made some major refactoring in the last commit. We are no longer using anyhow
since I cannot figure out how to extract the contexts from anyhow::Error
after putting them in. Instead, I created a custom Rerr
for our purpose, which is basically a vector of contexts. I have also implemented the collect_handled
function for LazyFrame
based on this design:
https://github.com/pola-rs/r-polars/blob/3349349ad2ef6d3fcfc546920bf047688757aecc/src/rust/src/lazy/dataframe.rs#L78-L81
Hopefully this looks cleaner than the previous version.
And by the way I have implemented a test_rerr()
function so that you can directly use it to test the new Rerr
on the R side.
And by the way I have implemented a
test_rerr()
function so that you can directly use it to test the newRerr
on the R side.
sweet looking forward to try it out "tomorrow" :)
I really like this pattern you have come up with. I think this could work well in all aspects we have talked of. Elegant.
I have tinkered with it and used in a conversion function robj_to_str
. I like the variant TypeMissMatch but often the argument blaming and conversion error will not happen on the same scope level. I tried two new variants TypeMiss (of two strings) and BadArgument(of one string).
use crate::rerr;
use crate::rerr::WithRctx;
#[extendr]
fn test_rerr_str(fishbone: Robj) -> rerr::RResult<String> {
let s = crate::utils::test_robj_to_str(fishbone).blame_arg("fishbone");
s.map(|s| s.to_string())
}
#[derive(Clone, Debug)]
pub enum Rctx {
Extendr(String),
Hint(String),
Plain(String),
Polars(String),
When(String),
TypeMismatch(String, String, String),
TypeMiss(String, String),
BadArgument(String),
}
#[derive(Clone, Debug)]
pub struct Rerr(Vec<Rctx>);
pub type RResult<T> = core::result::Result<T, Rerr>;
pub trait WithRctx<T> {
fn ctx(self, rctx: Rctx) -> RResult<T>;
fn hint<S: Into<String>>(self, msg: S) -> RResult<T>;
fn when<S: Into<String>>(self, msg: S) -> RResult<T>;
fn blame_arg<S: Into<String>>(self, msg: S) -> RResult<T>;
}
impl<T: Clone, E: Into<Rerr>> WithRctx<T> for core::result::Result<T, E> {
fn ctx(self, rctx: Rctx) -> RResult<T> {
self.map_err(|e| {
let mut rerr = e.into();
rerr.0.push(rctx);
rerr
})
}
fn hint<S: Into<String>>(self, msg: S) -> RResult<T> {
self.ctx(Rctx::Hint(msg.into()))
}
fn when<S: Into<String>>(self, msg: S) -> RResult<T> {
self.ctx(Rctx::When(msg.into()))
}
fn blame_arg<S: Into<String>>(self, arg_name: S) -> RResult<T> {
self.ctx(Rctx::BadArgument(arg_name.into()))
}
}
use from R
> polars:::test_rerr_str(42)
$ok
NULL
$err
Error: The argument [fishbone] caused an error because : expected a [string] value, but got [42.0] instead
attr(,"class")
[1] "extendr_result"
I really like this pattern you have come up with. I think this could work well in all aspects we have talked of. Elegant.
I have tinkered with it and used in a conversion function
robj_to_str
. I like the variant TypeMissMatch but often the argument blaming and conversion error will not happen on the same scope level. I tried two new variants TypeMiss (of two strings) and BadArgument(of one string).
I broke the TypeMismatch context into two as you suggested. Also I used thiserror to make the Rctx
looks nicer.
@sorhawell I've just updated the conflicting unit tests. Most of them are still using the old expect_grepl_error
, while I created a new expect_rerr
helper function for those that have a Rerr
in their output.
error_trait.R to allow legacy string errors to coexist with Rerr both Rerr (Rerr.R) and string errors (string_error.R) implements the "trait" which is a collection of S3 methods error_conversion.R provides the new unwrap() and result() to use for any (two) error types
fix check warnings/errors update tests
pub struct Rerr(VecDeque<Rctx>)
use VecDeque to allow pushing contexts first and last in chain. Add when_last used for calls ("When calling:").
expose all Rerr methods
add WhereIn method and context variant
Rerr should also be renamed as the class is too generic to not S3 method namespace collide with some other package, which happend in the past for DataType. A new name could be RPolarsErr or PolarsErr.
@Sicheng-Pan what do you think of the added commit?
@sorhawell The new commits looks good to me. If we would like to rename Rerr
to avoid ambiguity, I would prefer RPolarsErr
.
ohh #247 will only be an issue when someone forks polars and run the actions as here.
building an error on R side could look like this
> err = .pr$RPolarsErr$new()$bad_robj(42)$misvalued("is less than zero")$when("storing small numbers")
> print(err)
When storing small numbers: Expected a value that is less than zero: Got value [Rvalue: 42.0, Rsexp: Doubles, Rclass: ["numeric"]]
> err$contexts()
$When
[1] "storing small numbers"
$ValueOutOfScope
[1] "is less than zero"
$BadValue
[1] "Rvalue: 42.0, Rsexp: Doubles, Rclass: [\"numeric\"]"
Why
Currently, most errors in r-polars are plain string error messages, and for the errors from other crates (e.g. pola-rs) we have to manually convert it into string. Sometimes we append extra string context to illustrate why there is an error. Such string errors could become hard to work with when there are multiple layers of results to unwrap, since we need to directly manipulate the error strings. We also lose the layered structure of the error in the process, and on the R side we only see a single error message.
How
The anyhow crate provides common error handling implementations that can simplify our workflow:
anyhow::Error
for all internal functions on the Rust side (e.g.robj_to_*
). All errors implementing thestd::error::Error
trait (as well asSync
andSend
traits) can be automatically converted toanyhow::Error
with the?
operator. We can also use<error>.context(<msg>)
to wrap the error with additional context. Currently theextendr_api::Error
unimplementsSync
andSend
, so we cannot directly reuse those errors for now.Rerr
) for the interops between R and Rust. Ananyhow::Error
can also be automatically wrapped toRerr
with the?
operator. We use the wrapper error at boundary functions (e.g.import_arrow_ipc
) that connects R and Rust. We can also implement as many helper functions for the wrapper error as we like to and export it to the R side (e.g. dump the layers of contexts to R).Progress
I have implemented a minimal version of this error handling system. The
scan_arrow_ipc
function on the R side can dump error messages from ananyhow::Error
. Currently theimport_arrow_ipc
on the Rust side looks messy since we are using Strings as errors, which do not implement thestd::error::Error
trait by default. Hopefully this could look a lot cleaner after we refactored the internal functions.