Error handling discussion

pdyraga commented 6 years ago

Let’s discuss how do we handle errors in Keep.

So far, all the errors are eventually written to stdout. This is fine for now but can bite us at some point. We need to bear in mind we are implementing a distributed system and log analysis in a traditional meaning will not be possible. The quality of error reports will depend on Keep node maintainers and how easy it is to locate and submit detailed error logs.

Here are some initial thoughts just to start the discussion. As always, please post your own, agree and disagree.

I think we should log errors both to the file and to stdout. The traditional approach is a daily logrotate but I am not sure if that’s the right move in our case. Wouldn’t it be better to have error logs organised in terms of functionality? For instance, random beacon error logs in one file, T-ECDSA error logs in another file. For all cases except network/connectivity issues this looks as a better approach for me.

About the error handling itself, I’d suggest to always add a context to our errors. This way we’ll be able to locate the problem easily as what’s happening around the failing code is a crucial thing.

For example, this is correct:

func generateDsaKeyShare() (*dsaKeyShare, error) {

    secretKeyShare, err := rand.Int(rand.Reader, curveParams.N)
    if err != nil {
        return nil, fmt.Errorf("could not generate DSA key share [%v]", err)
    }
…
}

And this is terrible, since we are left with no context about the situation (assuming the calling function does not add anything):

func generateDsaKeyShare() (*dsaKeyShare, error) {

    secretKeyShare, err := rand.Int(rand.Reader, curveParams.N)
    if err != nil {
        return nil, err
    }
…
}

nkuba commented 6 years ago

We need consistent logging with a standardized pattern containing such information as time, package etc. Having different log levels would be also nice in case of issues analysis. The application should be able to start in verbose mode. We could have separate files for each function or even a log level. Log collecting and analysis tools are a great solution, especially in decentralized infrastructure. Do you know if they are also used in blockchain?

I'm sometimes missing a stack trace of the error when it's passed up over a couple of functions. It would be great to have a trace to the origin of the error.

pdyraga commented 6 years ago

I like the package logging suggestion. I imagine public key could not be generated may be outputted by both random beacon and T-ECDSA code.

I am not sure about logging levels - currently, we discuss error handling and error is always an error. If we decide we need more verbose logging, we should probably discuss it separately.

For log collection - I did not hear anything about it for blockchain. I think sending logs periodically to one hardcoded destination is not an option. We will probably rely on node owners sending them to us if anything wrong happens.

@ngrinkevich @pschlump @rargulati What do you think?

pschlump commented 6 years ago

In the Go code, adding the file/line number that the log message was generated from is an amazingly useful bit of information

// LF Returns the File name and Line no as a string.
func LF(d ...int) string {
    depth := 1
    if len(d) > 0 {
        depth = d[0]
    }
    _, file, line, ok := runtime.Caller(depth)
    if ok {
        return fmt.Sprintf("File: %s LineNo:%d", file, line)
    } else {
        return fmt.Sprintf("File: Unk LineNo:Unk")
    }
}

Call it as fmt.Printf ( "%s", pkg.LF() ) prints out the current file/line, as pkg.LF(2), prints out the calling package. We could change the format that it returns the data - but it will tag a message to the exact source.

pschlump commented 6 years ago

Sorry - dropped my coffee cup on my mouse.

pdyraga commented 6 years ago

I personally like the idea of logging errors with the package name and line number. On the other hand, I'd like to avoid the situation where we need to remember about adding pkg.LF() or pkg.LF(2) to our code each time we do fmt.Erorrf. What if we had our own package having function just for creating errors that would add all the information under the hood? For example, we'd write just:

sth.error("oops [%v], err)

This way we'd have consistent error messages across the application. We don't even need to add any external library and I am not sure if there is any offering JVM-like formatters for log messages.

pdyraga commented 6 years ago

What about the destination for log messages? In the first message, I suggested to have error logs organised in terms of functionality. On the other hand, if we add a package name, having one log file may be a better solution. Since we'd log only errors to this file, maybe we can have a weekly logrotate? This is to avoid situation when someone wants to submit 10MB log file to report a bug.

pschlump commented 6 years ago

For log messages - if it is going to a file - log rotation is basically required.

For systems that are larger - with multiple servers - logs going to some sort of a consolidator (ryssloc, logstash, etc...) is really important.

I have used logrus before - it is a go package for logging.

On Mon, Aug 20, 2018 at 2:54 AM pdyraga notifications@github.com wrote:

What about the destination for log messages? In the first message, I suggested to have error logs organised in terms of functionality. On the other hand, if we add a package name, having one log file may be a better solution. Since we'd log only errors to this file, maybe we can have a weekly logrotate? This is to avoid situation when someone wants to submit 10MB log file to report a bug.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/keep-network/keep-core/issues/244#issuecomment-414245505, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhMQQYSPOgS3nbJYpPzu0VoZL_kwLK_ks5uSnlBgaJpZM4VztvO .

-- Philip Schlump

Shadowfiend commented 6 years ago

Couple of notes from me here:

Log rotation is best left to an external tool. I don't think we should be concerned with this.
As we look at logging and error handling, there are some external libraries that allow attaching stack trace info to error objects, for example.
Logging is a useful topic, but I was actually more interested in how things end up in logs. Right now we have a million places that write to stderr/stdout. Can we clean this up? Can we limit the number of places that do actual writing to logs or files or std[out|err]? We've brought up error channels in the past, where subcomponents have channels that emit errors for logging, for example.

Shadowfiend commented 6 years ago

An additional way of looking at the question of how things end up in logs: consider that logging is a “side effect” in the functional programming sense—a stateful action, as it were. How can we reduce the number of functions with side effects that don't manifest as return types*?

keep-network / keep-core

Error handling discussion #244