Migration plan for `Display` vs `source` guidance

yaahc commented 3 years ago

In https://github.com/rust-lang/project-error-handling/issues/27 we established that public API Error types need to be careful and consistent in how they expose information via the Error trait so as to avoid duplicating context when reporting errors. Later when preparing the blog post to first communicate this new guidance @BurntSushi noted that crates that need to change their error types based on the guidance will need a migration plan to mitigate issues caused by breaking runtime behavior of their error type's API. In this issue we're going to create a suggested migration plan for authors of libraries affected by this guidance, to ensure a smooth transition towards consistent error reporting across the Rust ecosystem.

Suggested Migration Plan

Note: We're interested in feedback in the guide and want to make it clear that this isn't an edict that The Library Team is passing down without any input from the rest of ecosystem. If these suggestions don't work for you please let us know.

Error types that go against this guidance need to be careful to avoid breaking runtime behavior of dependent crates. The biggest risk to be aware of is dependent crates who depend on calling downcast on the error returned from Error::source in order to react to specific source types. Additionally, dependent crates may be relying on the Error type printing itself and it's sources, so an upgrade might cause applications to give less detailed error messages to users. We recommend library authors who need to update their error types:

Prefer removing source from the Display impl, rather than removing it from Error::source and potentially breaking code that reacts to specific errors at runtime.
Treat changing the Display impl as a breaking change and increment semver accordingly.
Document that the Display behavior on your errors has changed, as visibly as you can, since your users may bump the version and assume it doesn't affect them if it compiles.

yaahc commented 3 years ago

So some initial thoughts:

changing an error type to remove a source or to change the Display output is a breaking change
This kind of breaking change can't be caught by the compiler, so incrementing the major version of a crate may be insufficient. If users don't read the changelog to see that they need to be careful about their error usage they may just upgrade to the next version and assume it's working because it compiled, only to have a downcast be silently broken or something.
The potential breakages are
- Removing a source and leaving it's error message in your Display output may silently break code that depends on matching against the type id of a source error
- Removing a source error message from your Display may break users who use string comparison to react to specific error variants (this is already widely discouraged, but we may wish to explicitly add this to the API guidelines as well).
- Removing a source error message from your Display may break reporting that relied on Display for printing error reports, losing context silently
Of these I expect the breakage to downcast is likely the biggest issue that library authors should avoid
We can add a lint to help users who previously relied on Display for reporting to warn against places where they print errors via Display or Debug and to instead suggest they use std::error::Report.

BurntSushi commented 3 years ago

changing an error type to remove a source or to change the Display output is a breaking change

This bit is interesting, because I would not have necessarily assumed that either one of these was a breaking change. The source method gives almost no guarantees at all. The docs for the Error trait itself are pretty permissive on what implementations can do. Moreover, I know I've wondered in the past whether returning an error type from an otherwise private dependency as a dyn Error raises it to the level of a public dependency, since that type is now technically exposed through downcasting. There's also other factors here, for example, if the underlying implementation changes how it works to, say, maybe avoid errors or similar, it might end up returning None for a source where it previously did, and one might reasonably consider this a non-breaking change.

As for Display, I'm not sure if it's breaking. To be honest, if I thought I could write a clearer error message, I would just do it and not really think twice about whether it's breaking or not. I'm pretty sure I've done that many times.

The interesting bit here I think is that the move of information out of Display and into source might itself constitute a breaking change because of its aggregate effect. That is, an error message that once (presumably) included complete context on what went wrong now may not. For example, if I have a CLI tool that just does eprintln!("{}", err) without any care about the causal chain, and a library changes their Error impl require following the causal chain to get the complete error info, then I would be a bit annoyed if it weren't done in a semver breaking change.

So I think what's a "breaking change" here with respect to Display is its information content, rather than the precise bytes it emits. That's a bit more nebulous to think about though...

I don't know if these are fully formed thoughts or not, and I haven't had a chance to digest everything here, but I'm out of time so I'll just leave it here for now!

yaahc commented 3 years ago

This bit is interesting, because I would not have necessarily assumed that either one of these was a breaking change.

Yea, this is also the bit I was least confident on asserting, but I feel like it's probably the most important factors for us to reach consensus on, because it really sets the tone for how seriously library authors should take this migration.

Also, full agree on your conclusion about the boundary for what is and isn't a breaking change for Display. I think we might want to formalize this as a recommendation either in std or the API guidelines. Something like "the standard library doesn't guarantee stable Display output for Error types but does guarantee it won't reduce the amount of information that is communicated". This would establish a precedent for the rest of the ecosystem to follow and would help clarify our stability policy for Display in general.

yaahc commented 3 years ago

I'm wondering if we should recommend they wait to follow this plan until we have std::error::Report or whatever equivalent we end up with in the end available on stable. The recommendation already applies now, so I'm leaning towards no, but it feels kinda lame saying they have to design their error types this way and oh by the way now you need this extra thing that we don't even provide for you yet.

BurntSushi commented 3 years ago

@yaahc Hmmm that's a good point. I wonder if that's what has been holding me back. (That sounds like a pretty flip thing to say, but it's sometimes hard for even myself to tell the difference between "I have too much to do" and "the path forward doesn't feel entirely clear to me, so I'm going to cling to the status quo.") That is, it does seem kind of non-ideal to say, "do this thing, and oh by the way, if anyone using your crate wants to get the full error message, they have to either manually traverse the causal chain or use some crate to do it for them." Maybe using error handling helper crates (like anyhow) is so common anyway that it's not a real problem. I'm not sure.

yaahc commented 3 years ago

Okay so we talked about this today in the libs team meeting and came to a conclusion for how we should migrate. The gist is that we should recommend that library authors specifically prefer removing the source from the Display impl over removing it from source if they're worried that users may be depending on downcast to react to source errors.

Kixunil commented 2 years ago

There's one more consideration: crates supporting no_std. Naive implementations would fail to display the source without std completely. Conditionally displaying looks like the best option to me, however I'd like to hear your thoughts on it. I even wrote a macro:

macro_rules! write_err {
    ($writer:expr, $string:literal $(, $args:expr),*; $source:expr) => {
        {
            #[cfg(feature = "std")]
            {
                let _ = &$source;   // Prevents warnings.
                write!($writer, $string $(, $args)*)
            }
            #[cfg(not(feature = "std"))]
            {
                write!($writer, concat!($string, ": {}") $(, $args)*, $source)
            }
        }
    }
}

Usage in Display impl:

// notice the semicolon
write_err!(formatter, "parsing failed at position {}", self.pos; self.source);

yaahc commented 2 years ago

Conditionally displaying looks like the best option to me, however I'd like to hear your thoughts on it. I even wrote a macro:

My feeling is that error in core is close enough to landing that we shouldn't be recommending new work-arounds that are dependent upon it's absence when we will need to backtrack almost immediately.

Kixunil commented 2 years ago

@yaahc oh, I missed that, thanks for heads up! The workaround is still applicable to crates with MSRV but indeed, don't recommend for code unrestricted by it.

rust-lang / project-error-handling

Migration plan for `Display` vs `source` guidance #44

Suggested Migration Plan