rust-lang / project-error-handling

Error handling project group
Apache License 2.0
268 stars 18 forks source link

Guidelines for implementing Display and Error::source for library errors #27

Open Nemo157 opened 3 years ago

Nemo157 commented 3 years ago

I have been involved in trying to fix a couple of crates errors to work better with error reporters like anyhow/eyre recently (https://github.com/clap-rs/clap/issues/2151, https://github.com/sunng87/handlebars-rust/issues/398). One big issue I find with reporting these sorts of issues is not having a succinct, canonical "this is how errors should be implemented" source to link to. The only api-guideline related is C-GOOD-ERR which doesn't talk about chaining at all, and simply says:

The error message given by the Display representation of an error type should be lowercase without trailing punctuation, and typically concise.

It would be very useful to have something to link to which talks about chaining to sources, not adding unnecessary "Error: " prefixes, and not printing your source's message as part of your own message. (I'm assuming a chapter/appendix in the error-book, or a new section in the api-guidelines).

Enet4 commented 3 years ago

Including relevant references, as they already appear to cover most of the concerns presented in this issue: #23 rust-lang/api-guidelines#210.

  1. It would be very useful to have something to link to which talks about chaining to sources,

The main problem is that there isn't a well established error reporter or a consensus on how they should be reported across the ecosystem. Error reporting isn't a fully solved problem, and libraries or applications without a revamped error reporter with error chain awareness would suddenly provide subpar messages if their dependencies began expecting them.

  1. not adding unnecessary "Error: " prefixes,

That sounds reasonable, and could be independently proposed as an addition to the C-GOOD-ERR guideline.

  1. and not printing your source's message as part of your own message.

As per rust-lang/api-guidelines#210, that recommendation was already proposed but was put on hold mostly for the reasons stated in point 1 and in the issues linked.

Nemo157 commented 3 years ago
  1. It would be very useful to have something to link to which talks about chaining to sources,

The main problem is that there isn't a well established error reporter or a consensus on how they should be reported across the ecosystem

I don't think this necessarily needs to reference the reporting side, just provide guidance on when and how you should implement Error::source.

mkantor commented 3 years ago

A fundamental question in my head is: should library authors expect that end users will see source error messages?

If so, it encourages layering just a bit of extra context in each Error (e.g. Reverting to default settings -> File "foo/settings.json" could not be opened -> No such file or directory (os error 2)).

If not, it nudges library authors towards including information about lower-level Errors in their messages (e.g. Reverting to default settings because "foo/settings.json" was not found). But they may still want to attach sources to aid consumers of the library (even if end users are never meant to see them).

The former leads to subpar UX when applications don't expose source messages (Reverting to default settings is frustrating if you can't tell why) and the latter leads to overly-verbose/duplicative messages that are hard to skim when applications do print source messages (Reverting to default settings because "foo/settings.json" was not found: File "foo/settings.json" could not be opened because the file does not exist: No such file or directory (os error 2) has a low signal-to-noise ratio).

Since today libraries make different assumptions it means application developers have to either settle for crummy error messages in places or write distinct message formatting code for each individual error (which lessens the appeal of libraries like anyhow & eyre).

mkantor commented 3 years ago

I guess my previous comment is what #23 is about. Apologies for not clicking through before writing it up.

epage commented 3 years ago

A fundamental question in my head is: should library authors expect that end users will see source error messages? ... I guess my previous comment is what #23 is about. Apologies for not clicking through before writing it up.

I actually think this is relevant in a wider context than #23. In my experience, there are different dimensions to source

In my experiments with status crate, I joined these two concepts with the idea of chaining in an internal source. The reporter then needs to explicitly enable reporting o the internal `source. Really, they are separate dimensions. No idea how we should generalize this though.

23 provides guidance on source as part of the discussion for Display but doesn't broach the general source API and UX trade offs and how to handle them.

yaahc commented 3 years ago

I don't think this necessarily needs to reference the reporting side, just provide guidance on when and how you should implement Error::source.

The problem is that the guidance we would give depends on how we decide to officially conceptualize the role of the Error trait. The big question being, "Is the Error trait primarily an interface for reporting errors or primarily an interface for reacting to specific type erased errors?".

If we decide to prefer the former viewpoint then we would want to prioritize not duplicating information in the reporting interfaces of the Error trait. That would then turn into the guidance: Return source errors via Error::source unless the source error's message is included in your own error message in Display.

If we decide to prefer the latter then we would want to prioritize exposing every error type in the chain of errors to ensure users can always correctly downcast to specific error variants they want to handle. In this situation the guidance would be: Always return source errors via Error::source.

My feeling is that the former viewpoint, that the Error trait is primarily for reporting, is the ideal one. However, I feel like we're not necessarily in the best position to standardize on this approach. By saying that errors are only supposed to define how they're reported we're mandating that some sort of Reporter be present to iterate over sources / backtraces / etc. My feeling is that this is currently a missing piece and that the Error trait is really only half of a complete design for error reporting. For non-recoverable errors we already have PanicInfo as the reporting interface for non-recoverable errors and the panic hook/handler as the Reporter that consumes the interface, but for non recoverable errors the best we have for reporting is 3rd party libraries like anyhow and eyre.

I've been thinking on this specific issue a lot recently and my feeling is that if we can find some way to unify error reporting so that it is consistently defined for an entire application then we can pretty happily move forward with the former viewpoint and it should work well for everyone. I've some more thoughts on globally consistent error reporting here.

However, if we can't fix the holes around reporting for the Error trait then I don't feel that there is a clear winner for which guidance we should give...

yaahc commented 3 years ago

Follow up on this issue and https://github.com/rust-lang/project-error-handling/issues/23

I talked with the libs team today about the primary question outlined above (Is the Error trait primarily an interface for reporting errors or primarily an interface for reacting to specific type erased errors?). We came to the consensus that the primary role of the error trait is providing a structured access to error context for the purpose of reporting, so our guidance is officially decided.

As part of this we've also green-lighted efforts to patch holes in error reporting for error trait objects, particularly with how they interact with panics. I expect our immediate next step will be to add an API like std::panic::panic_any for passing &dyn Error + 'static instead of a &dyn Any to the panic handler. We will then utilize specialization in the standard library to make unwrap and expect and other functions that promote errors to panics to report the error with panic_error for types that implement Error. This + updates to the default panic handler will make it so we always print all the errors in a chain of errors when we promote them to panics. We then hope to follow this up with a more comprehensive RFC proposing a unified error reporting scheme for defining how panics and runtime errors are reported across an application.

Fishrock123 commented 3 years ago

std::panic::panic_any

I think you meant to write std::panic::panic_error.

PureWhiteWu commented 2 months ago

Hi, I've read the above discussion, and I still think that the error source should be included in the error's message.

This is a typical code flow for rpc services:

// in sdk B

#[derive(thiserror::Error)]
pub enum BError {
    #[error("reqwest error: {0}")]
    ReqwestError(#[from] reqwest::Error),

    #[error("hyper error: {0}")]
    HyperError(#[from] hyper::Error),
}

pub async fn do_something() -> Result<SomeType, BError> {}

// in sdk A

#[derive(thiserror::Error)]
pub enum AError {
    #[error("B error: {0}")]
    BError(#[from] b::BError),

    #[error("C error: {0}")]
    CError(#[from] c::CError),
}

pub async fn do_something_further() -> Result<(), AError> {
    let b_ret = b::do_something().await.map_err(|e| {
        tracing::error!("[A] do_something failed: {e}"); // print a log using tracing here
        e.into()
    })?;
    ...
}

// in the server handler

pub async fn handler(req: Request) -> Result<Response, some_framework::ServerError> {
    let a_ret = a::do_something_further().await.map_err(|e| {
        tracing::error!("[handler] do_something_further failed: {e}"); // print a log using tracing here
        some_framework::ServerError::new_biz_error(e) // assume we can convert a::AError to the framework defined ServerError
    })?;
}

In this case we need to print an error log whenever we get an error, and in the server handler we still need to make it strongly typed to send it back to the client side, so we cannot simply use something like anyhow to report the error which will erase the type.

Also, I don't think code like this is reasonable:

pub async fn handler(req: Request) -> Result<Response, some_framework::ServerError> {
    let a_ret = a::do_something_further().await.map_err(|e| {
        tracing::error!("[handler] do_something_further failed: {}", anyhow::anyhow!(e)); // creates an anyhow error or some other reporter whenever we need to print it in tracing
        ...
    })?;
}

Furthermore, I don't think force users to use a reporter which will recursively prints the source is reasonable, this is too complexed and need users to know the detailed design of Error. The most typical usage is something like this:

println!("xxx failed: {e}");

Reverting to default settings because "foo/settings.json" was not found: File "foo/settings.json" could not be opened because the file does not exist: No such file or directory (os error 2)

Example like this maybe is a little noisy, but it's much easier to debug since it provides a lot of useful information.

Thomasdezeeuw commented 2 months ago

Here is another argument for adding the source. We missed a lot of information about errors because they were formatted/included using {} (fmt::Display) into our error message that made it to the logs. This meant that for a while we had terrible errors, while the errors themselves contained a lot more useful information. This took us a while to realise and let to opening https://github.com/Keats/tera/issues/915.

CryZe commented 2 months ago

I sometimes wonder if the entire Error trait should just be deprecated and redesigned from the ground up. Every single stable method on it is deprecated other than source (and still compiled in through all the Box<dyn Error>s), it's barely available in core and the Display and Debug traits are both far from ideal (this issue about including source or not and no way of localizing the error messages).

Like I'm 100% of the opinion that sources should not be included by default in Display, because that's the trait you use for chaining individual messages together in the first place. But that means you absolutely have to do that or you miss out on important parts of the error as shown above. So you both need the functionality of "give me this individual message" and "give me the entire error", while also possibly needing localization and maybe more details as part of debug printing. The current design just doesn't support what you would need at all.

Thomasdezeeuw commented 2 months ago

I sometimes wonder if the entire Error trait should just be deprecated and redesigned from the ground up.

I share this thought. It moving to core also means the backtrace method won't be added, if I understand https://github.com/rust-lang/rust/issues/103765#issuecomment-2135744596 correctly.

Hypothetically if the advice would be to always print all the error detail, i.e. the source, then source (the only method left) also becomes less useful. Maybe it's time to deprecate the Error trait?

(I'm saying this as someone who always implemented the Error trait, but lately have been seeing less and less reason to do so).

BurntSushi commented 2 months ago

There is no way we are deprecating the Error trait.

As for backtrace support, see: https://github.com/rust-lang/rust/issues/103765#issuecomment-1801053112

The problem here is real though, and I struggle with it myself. But that doesn't mean we ought to throw everything out and start over.

BurntSushi commented 2 months ago

For example, if anyhow::Error (or something similar to it) were to find its way into the standard library, then I think it would be much easier to justify the "if it appears in the Display impl, then it shouldn't be part of source" idiom. Because you'd have a standard library type specifically designed for printing causal chains. We don't have that today, which means it's easy to fall back to Box<dyn Error> and thus print an error where some details can possibly be omitted.

Nemo157 commented 2 months ago

There is an unstable type for printing error-chains: std::error::Report

BurntSushi commented 2 months ago

@Nemo157 Yeah I forgot about that. I think pushing that towards stabilization will do a lot to help this particular problem.

PureWhiteWu commented 2 months ago

I don't think std::error::Report will solve the problem, since users are now supposed to write println!("execute failed: {}", std::error::Report::new(e)) instead of println!("execute failed: {e}").

programmerjake commented 2 months ago

iirc {e:!} was proposed to print an error report for e as a new format kind

BurntSushi commented 2 months ago

And to be clear, I was careful in my wording above: I didn't claim anyhow::Error or std::error::Report would solve this problem. Just that it would help. The problem today is that if std::error::Error impls follow the "source XOR display" idiom, then the only way to correctly print error messages is to print the chain. And the most convenient way to do that is with a third party crate. Moreover, even if you do print the chain, if there's a std::error::Error impl in the chain somewhere that doesn't follow the idiom, then you could wind up with duplicative error messages. So the "solution" to this problem is to move the entire ecosystem over to the "source XOR display" idiom so that all std::error::Error impls inter-operate correctly. That is a hard solution to fully implement, but I think the place to start is a rallying point in std that rewards folks for adhering to the source-xor-display idiom.

PureWhiteWu commented 2 months ago

iirc {e:!} was proposed to print an error report for e as a new format kind

Thanks for this reminder! Is this a new syntax that is not stable?

source XOR display

If all errors are supposed to use {e:!} to print to get the innermost cause (most useful part) of error instead of {e} directly, what's the meaning to use {e} directly (the Display message) then at that time? Seems that use {e} directly will not help anymore.

PureWhiteWu commented 2 months ago

I mean, maybe we can just change {e} to be {e:!} at that time since it's much more useful? But seems that it may be a break change.

8573 commented 2 months ago

Moreover, even if you do print the chain, if there's a std::error::Error impl in the chain somewhere that doesn't follow the idiom, then you could wind up with duplicative error messages.

The chain printer could filter out duplicated messages; snafu's Report chain printer does, for example.

seanmonstar commented 2 months ago

While I changed hyper and reqwest's error types to follow this pattern, I do still feel like it was a downgrade for users who don't know about reporters (which I posit is most users). This is why I proposed inverting it all, and making the default more informative and making you opt-in to finer control: https://github.com/rust-lang/project-error-handling/issues/53 (explored in https://github.com/seanmonstar/errors).

A couple years later, and I still feel: informative by default is more user-driven, reporters are for advanced users. I want defaults that help the most people, advanced users can figure out how to do other things.

For example, here's references to users being confused by the default showing them less:

PureWhiteWu commented 2 months ago

The chain printer could filter out duplicated messages; snafu's Report chain printer does, for example.

I took a look at that crate, seems that the filter may lead to some loss of performance?

I agree with @seanmonstar , reporters should be for advanced users, and the defaults should help the most people

source may be helpful for advanced developers, but it's not always that case.

In most cases, users just want to get the most detailed message which is analyzed by themselves to figure out the reason, and in these cases the innermost cause is likely the most helpful one.

jonhoo commented 2 months ago

I wonder if there's a way (maybe with specialization and a newtype in format_args) to automatically make {e} where e implements Error print via a reporter, and then have a way to bypass that specifically when implementing a reporter?

LukeMathWalker commented 2 months ago

Making {e} print the full error chain (no matter the underlying mechanism) is likely to result in programs that, by default, expose their inner implementation details to users.
I feel like that defeats the purpose of Display.

If we follow this reasoning, it'd make more sense for {e:?} to include the error chain by default, rather than {e}.

Notwithstanding the above, I still think source XOR display is the right approach here.