Add Error.Trace*(), and make Wrap/WithMessage never nest.

jaekwon commented 6 years ago

"The traditional error handling idiom in Go is roughly akin to .... which applied recursively up the call stack results in error reports without context or debugging information. The errors package allows programmers to add context to the failure path in their code in a way that does not destroy the original value of the error."

Even if you generate a stack trace at the source, callers of a function (which returns an error) will in general want to wrap it again to add more context, and the preferred method is to use errors.Wrap(), not errors.WithMessage().

, err := externalFunction(r)
if err != nil {
        return errors.Wrap(err, "msg") // or should we .WithMessage??
}

Perhaps there could be a more intelligent automatic way to choose between errors.Wrap() or errors.WithMessage(), but what would that look like? The only obvious 100%-correct-general-solution given the state of github.com/pkg/errors today (without assuming anything about the implementation of externalFunction() error) is to always use errors.WithMessage() but include minimal trace information like the filename and line number.

So OK, if github.com/pkg/errors starts advertising errors.WithMessage() as the primary method of adding contextual information, and its implementation is changed to include a modicum of trace information (filename, lineno), (or if errors.Wrap() were changed to include only minimal trace information), then we're 89% of the way there. But there's still a problem.

In general, it breaks the intent of Golang to wrap an error with github.com/pkg/errors.Error, because it breaks the one way that we're supposed to deal with error control flow... that is, we're supposed to switch on err behavior (or err concrete type, though it's not preferred).

type FooError struct{}
func (_ FooError) Error() string { return "" }

type BarError error

func externalFunc() error { return nil }

func main() {
    err := externalFunc()
    switch err.(type) {
    case FooError:
        // handle FooError
    case BarError:
        // handle BarError
    }
    fmt.Println("Hello, playground")
}

And github.com/pkg/errors violates that. There exists errors.Cause(), but now you have to dig down to the first non-errors.Error-error before you can type-check like above. And you can't just take the root-most cause of an error unless you preclude other custom error types from having a cause.

The solution is to not wrap an error, but to add trace information to the same error object. Ultimately it is error itself that needs to be a tracer.

Partial solution

If github.com/pkg/errors were to always keep a single (non-nested) Error by returning the same github.com/pkg/errors.Error upon errors.Wrap() and errors.WithMessage(), then we can switch confidently on behavior:

var err error := externalFunc()
switch err := errors.Unwrap(err).(type) {
case FooError:
    // handle FooError
case BarError:
    // handle BarError
}

The name for errors.Unwrap() is tricky... It can't be errors.Cause() because if it were a non-github.com/pkg/errors.Error causer error, we'd be switching on the wrong error. So Unwrap seems like a better name.

The problem is that it isn't 100% consistent with errors.Wrap(), because you only need to unwrap once what was wrapped a million times. Maybe errors.Wrap() should be deprecated, and errors.TraceWithError(error) error could ensure that the error err is already a errors.Error (in which case it would pass it through after calling err.Trace()) or else it would return errors.Wrap(err). Maybe errors.Unwrap() should be called errors.MaybeUnwrap().

Original discussion: https://github.com/golang/go/issues/23271#issuecomment-354531105

puellanivis commented 6 years ago

“the preferred method is to use errors.Wrap(), not errors.WithMessage().” False. Is it by far the most common? Yes. But it is not the “preferred method”. errors.Wrap should only be used when one wishes to attach not only a message, but a stack trace. If one wants to just add a message, then errors.WithMessage is the proper function to call.

“Perhaps there could be a more intelligent automatic way to choose between errors.Wrap() or errors.WithMessage(), but what would that look like?” It’s called a programmer, and it is a human being capable of intelligent decision making.

“that is, we're supposed to switch on err behavior (or err concrete type, though it's not preferred)“ False, we want to test for implementation of interfaces:

_, err := FunctionReturningMaybeError()
if err != nil {
   err := errors.Cause(err)
   if _, ok := err.(interface{ Retryable() bool }); ok {
     // doRetryBehavior here
   }
}

https://dave.cheney.net/2016/04/27/dont-just-check-errors-handle-them-gracefully

jaekwon commented 6 years ago

“the preferred method is to use errors.Wrap(), not errors.WithMessage().” False. Is it by far the most common? Yes. But it is not the “preferred method”. errors.Wrap should only be used when one wishes to attach not only a message, but a stack trace. If one wants to just add a message, then errors.WithMessage is the proper function to call.

Yes, that's obvious. Here's the point I'm trying to make: Earlier, you wrote in another thread that:

A stacktrace at the generation of the error itself should be sufficient to isolate where the error was generated, and that is precisely what github.com/pkg/errors.New and github.com/pkg/errors.Errorf do.

If you're saying that the original cause of the error happened to use github.com/pkg/errors.[New/Errorf] and captured a stack-trace, then that is sufficient, then I completely agree with you. Indeed, all we need is a stack-trace at the very origin of the error (aka the "underlying" cause or "root" cause). It would be wasteful to capture anything more!

But midway in the stack, a caller doesn't know whether the "underlying" cause happened to capture the stack with github.com/pkg/errors.[New/Errorf], or whether any .Wrap() had been called either. There's no way for a function in the middle of the stack (i.e. a caller) to know whether any stack-trace had already happened. So the only safe thing to do is to call .Wrap(), but that captures pretty much the whole stack-trace (actually just 32 according to the implementation of capture(), which is a compromise between capturing the whole stack-trace and not capturing enough -- the worst of both worlds).

So if you have a big program with many nested function calls to create a large stack of length N, and they all used github.com/pkg/errors, you'd be capturing O(N) stack-traces, when you only need 1. The solution is to not capture the whole stack-trace upon error creation, but to capture it along the way out. Intuitively, this is because the origin ("underlying cause") of an error does not know whether someone mid-way in the stack (i.e. a caller) wants a stack-trace for itself. Furthermore, it isn't clear to the caller whether the origin had already captured a stack trace. Ergo the origin is not the right place to be capturing stack-trace information (in general).

“Perhaps there could be a more intelligent automatic way to choose between errors.Wrap() or errors.WithMessage(), but what would that look like?” It’s called a programmer, and it is a human being capable of intelligent decision making.

Ok, so say you are given func ExternalFunc() error, and you want to ensure that there will be a stack-trace context for this error in a performance critical path. Do you call .Wrap() or .WithMessage()? Try to answer this question, please.

“that is, we're supposed to switch on err behavior (or err concrete type, though it's not preferred)“ False, we want to test for implementation of interfaces:

That's what switching on "behavior" means. (Did you know that you could switch on interfaces?)

jaekwon commented 6 years ago

@puellanivis I just found 4 existing issues actually related to the problem that this proposal fixes.

puellanivis commented 6 years ago

So the only safe thing to do is to call .Wrap()

False. If you are calling a package and you need to know the stacktrace at your point for debugging, you call Wrap, if not, you do not, then you don’t need to use Wrap. What good will it do you to trace an errors.New stack to someone else’s package? Unless you intend to debug someone else’s package.

Try to answer this question, please.

Check the dependencies of the package you’re calling. If it calls into pkg/errors finished. WithMessage. Otherwise, use WithMessage.

(Did you know that you could switch on interfaces?)

No, I patched a subtle shadowing bug in the RPC code at Google, and didn’t learn that I could use an Interface in a type assertion.

puellanivis commented 6 years ago

I just found 4 existing issues actually related to the problem that this proposal fixes.

And two of them are closed, and others are unimplementable.

jaekwon commented 6 years ago

False. If you are calling a package and you need to know the stacktrace at your point for debugging, you call Wrap, if not, you do not, then you don’t need to use Wrap.

You've conveniently ignored my point about O(N) stack traces.

What good will it do you to trace an errors.New stack to someone else’s package? Unless you intend to debug someone else’s package.

If there were traces left in by someone else's package, they were left there because it would actually be useful for debugging. If we followed the tracer convention of calling .Trace() only when it would be useful, what's the point of capturing the whole stack-trace?

var err = externalFunction()
switch err.(type) { ... }
return

What's the point of allowing externalFunction to capture the stack-trace including the ancestral callers, when the error is going to be handled completely without that info?

Replace "someone else's package" with "your own external package". "What good will it do you to trace an errors.New stack to your own external package? Unless you intend to debug your own external package." Well yeah...

If you don't want that information, maybe you can nuke it with TraceNuker. Maybe terror should be terrornuker.

No, I patched a subtle shadowing bug in the RPC code at Google, and didn’t learn that I could use an Interface in a type assertion.

Can't tell if you're being sarcastic...

And two of them are closed, and others are unimplementable.

Dontcare, and false. See "Partial solution" section.

puellanivis commented 6 years ago

can't tell if you're being sarcastic...

Yes, I am being sarcastic, because apparently you think I don’t know that a type switch can use interfaces.

Dontcare, and false.

No, that is not false. The two that are open are unimplementable, and I have commented about how they are so.

I understand, you’re attached to this idea, but it is not a good proposal. Especially as a universal change to the language itself.

I‘ve worked on this issue for over 2 years now, trying to come up with the ideal solution, the solution as currently available is the best available, and changing the language to “fix it” is not always the right solution.

jaekwon commented 6 years ago

Yes, I am being sarcastic, because apparently you think I don’t know that a type switch can use interfaces.

I showed you a type-switch on an interface, even pointed out that type-switching on the concrete type is possible but not preferred, and you said that I was wrong, and showed a non-type-switch version of the same concept. It's a natural conclusion to draw.

The two that are open are unimplementable, and I have commented about how they are so.

I've shown you the implementable solution, it's under section "Partial solution". You'll have to find a flaw in that solution.

davecheney commented 6 years ago

@jaekwon @puellanivis time for a time out please. Please revisit this discussion in the new year once you’ve calmed down.

Thank you

jaekwon commented 6 years ago

Thanks for moderating, Dave. Sorry for getting the convo heated, Cassondra. Many thanks to issue #144.

I agree that full stacktraces are also nice, in addition to tracing. I've taken all the concepts here and implemented them here: https://github.com/tendermint/tmlibs/blob/develop/common/errors.go

For your reference, here's a concrete case of extending tmlibs/common.Error: https://github.com/cosmos/cosmos-sdk/issues/766

Any feedback appreciated!

zerkms commented 6 years ago

I just stepped onto it as well: the "idiomatic" use of this package makes the original stack trace unavailable.

The origin of the problem still is easy to guess from the error message (thanks to the wrapping), but it overwriting the stack renders the whole idea of having the stack at all useless.

davecheney commented 6 years ago

How does using this package make the original stack trace unavailable. The goal of this package is to preserve the original error.

zerkms commented 6 years ago

@davecheney if you errors.Wrap multiple times - then the original stack would be unavailable (it might be printed with %+v verb but is the raw data is unavailable)

package main

import (
    "github.com/pkg/errors"
)

func foo() error {
    return fmt.Errorf("foo")
}

func bar() error {
    return errors.Wrap(foo(), "bar")
}

func baz() error {
    return errors.Wrap(bar(), "baz")
}

func main() {
    err := baz()

    // extract the original error stack here (expected to get the deepest stack available)
}

If WithMessage reused the original err stack - it would not be a problem

pkg / errors

Add Error.Trace*(), and make Wrap/WithMessage never nest. #144

Partial solution