risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.02k stars 577 forks source link

refactor: reinvestigate error handling and reporting in the system #11443

Open BugenZhao opened 1 year ago

BugenZhao commented 1 year ago

[!note] Tracking the tasks in the comment: https://github.com/risingwavelabs/risingwave/issues/11443#issuecomment-1790393393


As more and more features are introduced in our system, it turns out that error handling and reporting become increasingly messy at the same time. We're struggling with several issues, including...

Besides, developers also find it annoying to deal with errors...


Based on this, I propose several guidelines on how we should deal with errors in different modules in RisingWave:

xxchan commented 1 year ago

strong +1

some points

Error's source chain gets broken when reporting error, so the actual cause for the error is lost.

We might need to revisit this to see whether other places have the same problem. I didn't care much about source before.

Besides, developers also find it annoying to deal with errors...

We might need a developer guide (and/or crate rustdocs) for developers to follow. Even I've read a lot about error handling before, I feel still confused about how to do it. A short and actionable HOWTOs (and some simple explanations) should reduce a lot of confusions. It can be opinionated, but we need some guide.

Find it hard to categorize errors into different types or enum variants.

This is especially confusing. I'm now wondering whether categorization is needed at all.

Directly wrap anyhow::Error into a new error type

I recently noticed wasmtime::Error is like that. I had thought it should be an enum or sth, since it's a library. It turns out they use error.downcast_ref::<Trap> to get the "kind" (Trap). Might worth checking how they do it.


Some arbitrary refs (for myself to read later)

xxchan commented 1 year ago

BTW, I like this sentence: "Be either actionable or informational" (from https://github.com/awslabs/smithy-rs/issues/1950)

At a high level, errors were refactored to:

  • Be either actionable or informational. Actionable errors can be matched upon, leading to different program flow. Informational errors indicate why a failure occurred, but are not intended to be matched upon.
  • No longer print error sources in Display impls. A DisplayErrorContext utility has been re-exported in the types module to easily log or print the entire error source chain.
  • Reliably return their cause/source in their Error::cause/Error::source impl

P.S., I don't know whether the result of their refactor is good, since it has some learning burden for me. Haven't check its every details yet. AWS SDK is complex beasts though. 🤣

BugenZhao commented 1 year ago
  • No longer print error sources in Display impls.

I also found this a good practice we can follow. This is also how anyhow handles Display.

https://github.com/dtolnay/anyhow/blob/7fc0c073c4c31ef8664f699e9e4883b1c92b2fb6/src/fmt.rs#L7

BugenZhao commented 1 year ago
  • No longer print error sources in Display impls.

I also found this a good practice we can follow. This is also how anyhow handles Display.

dtolnay/anyhow@7fc0c07/src/fmt.rs#L7

However, not all libraries follows the convention. So eventually we get https://docs.rs/snafu/latest/snafu/struct.CleanedErrorText.html 🤣.

BugenZhao commented 1 year ago

Tracking the task:

xxchan commented 1 year ago

There are some good discussions. Note here for future reference.

13215 about boxed error: benefits and limitations

13248 about thiserror: what does its attributes do; when to add #[source] and #[backtrace] (almost always)

github-actions[bot] commented 4 months ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.