pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
229 stars 24 forks source link

Move from `error-chain` to `thiserror` for error management #8

Closed althonos closed 4 years ago

althonos commented 5 years ago

Hi!

error-chain is not maintained anymore, so it would be a good idea to move to failure, which has all the features needed already, as well as backtrace support (i.e. the underlying pest error could be accessed from a ParserError using the cause() method).

pchampin commented 5 years ago

Thank you for this suggestion. I need more input before I make a decision.

You argue that error-chain is not maintained anymore, however the conclusion of the thread you cite is not as clear-cut. Furthermore, it seems that error-chain is still being maintained after all.

That being said, after a quick glance at failure, I must say it does look more elegant and lightweight than error-chain (I'm not a big fan of the Error/ErrorKind dichotomy... it feels like coding in Java :smiling_imp:).

Still, I'm a bit uncomfortable with investing in a crate whose ambition is to deprecate a trait from the standard lib. Especially when some of the complains about the standard trait have been fixed in the meantime. In that respect, the Evolution section in their README is not really clear, but it makes me nervous.

So, could you please elaborate more why you think failure is a safer bet for the future than error-chain?

althonos commented 5 years ago

As you said, failure has the advantage of removing the Error/ErrorKind duplication. Furthermore, it can create a proper backtrace, without ambiguity, from any error, and not just ChainedError implementors: the backtrace is reached through the cause method, which returns the value of one of the variant's fields. It is unclear how to do that with error-chain, since I'm not sure you can both expose an error as an attribute and access it through the backtrace.

The other advantage is that the conversion traits make it easier to use ?, whereas error-chain sometimes require you to wrap/convert the error yourself (see the PR for the XML parser, where implementing another error category makes it more verbose to return a top-level error from the lower-level one).

pchampin commented 5 years ago

the backtrace is reached through the cause method, which returns the value of one of the variant's fields

But the cause method in failure seems to be redundant with the new source method in std::error::Error. This is the kind of divergence that I am concerned about.

the conversion traits make it easier to use ?, whereas error-chain sometimes require you to wrap/convert the error yourself (see the PR for the XML parser...)

It seems that you solved this problem in your latest commit f6eb5fe4c. Or is it something else?

As you can see, I'm still reluctant to make the leap. But I'm also still open to discussion.

althonos commented 5 years ago

The difference is that to do that with error-chain, you need to call Error::with_chain every time you want to chain something, and ultimately you'll write your own impl From<InnerError> for OuterError whereas failure lets you derive it with an attribute.

Furthermore, as seen in my PR, there is ambiguity between chaining and wrapping the error: in failure, you can both wrap the error as a field of your variant (hence accessing it without downcasting) and have it returned by the source method.

Ultimately, both libraries are doing the same thing, but failure is removing a lot of boilerplate code, as well as the Error/ErrorKind dichotomy.

pchampin commented 5 years ago

The difference is that to do that with error-chain, you need to call Error::with_chain every time you want to chain something,

actually, you would use Result:chain_err, which is not that verbose.

and ultimately you'll write your own impl From<InnerError> for OuterError whereas failure lets you derive it with an attribute.

I'm not sure about that. Explicit is better than implicit, and chaining an error is not the same as converting it...

as seen in my PR, there is ambiguity between chaining and wrapping the error

Granted. But in the case of ParseError, I wouldn't have wanted to fix in advance the type of error that can be its cause (because different parsers will have different underlying errors, your XML parser is a perfect example). So if I had had to define a member for the chained error, its type would have been Box<std::error::Error>, and it would still have required downcasting.

Ultimately, both libraries are doing the same thing

Yes, but failure is doing it its own specific way, while error-chain is playing ball with the standard Error trait...

I agree that error-chain is generally more verbose, but it still looks like a safer choice. Sorry.

althonos commented 5 years ago

I just discovered err-derive that provides a derive macro equivalent to failure, but that implements std::fmt::Display and std::error::Error. Leaving it there just in case :wink:

pchampin commented 5 years ago

Now you're talking :-D It looks very interesting indeed. Let's rename this issue and leave it open til I get some time to seriously consider the migration.

MattesWhite commented 5 years ago

When talking about alternative error handling crates, I would also consider snafu. It uses the std::error::Error trait and provides nicer derive-display syntax then failure, i.e. you can call methods on a variants elements. In addition, it has some kind of failure's context mechanics which are very nice to use.

pchampin commented 5 years ago

Thanks @MattesWhite, I'll also have a look.

MattesWhite commented 4 years ago

When talking about error-handling, what about the coercible_error.

I can definetly see its use-case and the performance it could bring and I can understand that you want to use it as it was probably your first crate in Rust. But, it feels very unergonomic and produces a lot of noise. In addition, I think its performance boost is not as much as one might guess.

A correctly designed error-enum should have not more than 24 bytes (two pointers and an extra 64 bit field on a 64-bit-machine). This shouldn't affect the overall memory requirement of an RDF-application.

In order to handle errors in an public API, it is totally okay to go with Box<dyn Error + Send + Sync> in my opinion. The improvements of the std::error::Error-trait allow downcasting it. In addition, a Box has only a size of 8 bytes.

An alternative would be using an own error-enum like:

use thiserror:Error as ThisError;

#[derive(ThisError, Debug)]
enum Error {
    ... ,
    #[error(External Error: {0})]
    FromExtern(String),
}

impl Error {
    from_external<E: std::error::Error>(e: E) -> Self {
        Error::FromExternal(e.to_string())
    }
}

This way every implementor of a custom graph, parser, etc. could impl From<ExternalError> for Error easily.

pchampin commented 4 years ago

Actually, coercible-error was not my first crate. It was something developed for Sophia that I thought could be useful to others.

it feels very unergonomic

Does it? It pains we that you would think that, I spent quite some time to try and make it ergonomic...

and produces a lot of noise

I cant deny that :-/ But most of the time, users would not be required to directly use it (the most verbose part are in trait methods with a default implementation).

I agree that Box<dyn Error + Send + Sync> would be a good default in many places, and that the overhead would be minimal. But I find a little sad to renounce the "Zero Cost Abstraction" motto, when a solution can be made with (in my opinion) no too much added complexity.

MattesWhite commented 4 years ago

Does it? It pains we that you would think that, I spent quite some time to try and make it ergonomic...

Sorry for that I don't wanted to be offensive. In addition, I removed my PR #27 as I have to admitt that this way was to aggresive and without a proper discussion... I deeply appologize for my inappropriate behavior.

MattesWhite commented 4 years ago

As I still agree that my PR #27 was not good, I still use it as a proof of concept.

Cons of the current error handling

The main point I wanted to lay out was the seperation of errors. Instead of having one big sophia::Error, I support the idea of smaller errors. This allows for an easier integration of 'external' errors.

As laid out in #26 the plan is to build an ecosystem of crates that implement against sophia-core's API. This means that there will probably be implementations we don't think of now that can and will fail in different ways. Consequently, sophia is required to allow and handle the integration of 'external' error-types.

The philosophy of coercible-errors is to force implementors to use a provided error-type (correct my if I got that wrong). The problem is that this provided error probably will not be sufficient for all use cases, as pointed out above. As a result, I think that coercible-error is not a suitable error-handling-strategy for the future 'center of the ecosystem' sophia-core.

Pros presented in #27

In difference, I showed an implementation of sophia with the rather new crates anyhow and thiserror ( see #27 ).

thiserror is only used to build the module-own error-types. It was designed specific for this and is easy to use.

The anyhow::Error is a "better Box<dyn Error + Send + Sync +'static>`". It should only be used when different error-types could colide, e.g. when building a Graph from an TriplesSource which both can fail. In every other case a specific error should be returned.

Furthermore, users will know which errors colide ("I use the TripleSource from NtParser to build a HashGraph, accordingly the error is either ParserError or HashGraphError"). Therefore, it should be possible for users to figure out which concrete error-types they can expect when they receive an anyhow::Error.

Another point is the weight of zero-cost error-handling. While it is true that coercible-errors allow to use zero-sized errors, the effect is mitigated due to the size of anyhow::Error as it is only a narrow-pointer, i.e. the size of a word.

Conclusion

While it is true that in some cases coercible-errors saves memory this benfit is mitigated by the good implementation of anyhow. Instead, an implementation with module-own error-types and anyhow allows a lot more easy integration of 'external' error-types into the sophia-ecosystem. Besides, with the soon stabilization of the never-type the compiler will become better at handling infallible operations by itself.


What is your opinion about that? I would be glad if we could continue this discussion. If we can agree on an implementation I will gladly submit a PR.

pchampin commented 4 years ago

Sorry for that I don't wanted to be offensive.

No offense taken, don't worry :-)

[pros & cons] What is your opinion about that?

Now, regarding coercible-error: I am convinced by the "one error per module" approach, especially in the future plan of splitting sophia into several crates. And indeed, this is incompatible with coercible-error's philosopy. +1 on getting rid of it.

I had a look at thiserror and I like it very much. +1 on using it.

I have more doubts about anyhow, though. It is explicitly targeted at applications, as opposed to libraries, and sophia is the latter. I have a counter proposal below.

Alternative to anyhow or Box<dyn Error>

The only place (I think) where we would need such a "generic" errors it is with (triple or quad) streams. The source and the sink can each have their own error type, and so the operation should be able to return one or the other.

I suggest we introduce (in the triple::stream module) the following type (with the appropriate thiserror annotations)

enum StreamError<E1, E2> {
  UpStream{ source: E1 },
  DownStream{ source: E2 },
}

Seems to me that this type has the best of both worlds from coercible-error and anyhow:

Granted, the return type of stream methods (such as in_sink and in_graph) will be slightly more verbose than with anyhow, but at least it is (I think) idiomatic and easily understandable.

pchampin commented 4 years ago

Thought: I think the smoothest way to contribute to this issue would be to keep the global error module untouched, but gradually add module-specific errors in each module that need it (one PR per module or group of related modules). Then, when the global error module is not used anywhere, get rid of it.

The first module to patch would probably be the triple::stream and quad::stream modules, should the proposal above be agreed upon.

MattesWhite commented 4 years ago

Great! The StreamError is a brilliant idea. I also agree with the 'one module at a time' strategy.

I would only name StreamError's variants a bit different.

enum StreamError<SErr, TErr> {
    Source { source: SErr },
    Target { source: TErr },
}

This way it is more obvious if a error was raised by the TripleSource or the consumer of it.

Another candidate that came into my mind is the either-crate. It is basically a sum type like the suggested StreamError but is not implemented specific as an error type. However, it provides some nice utility functions. I think it will serve as a good template when implementing StreamError.

When I have some time I would write a PR if it's okay for you.

pchampin commented 4 years ago

I moved the coercible-error/StreamError debate into another issue, as this is somewhat orthogonal to this one.

When I have some time I would write a PR if it's okay for you.

Be my guest :-) Do you mind if I assign those issues to you?