Add pluggable Logger for pipeline

bobheadxi commented 5 years ago

closes #244

This PR allows something like the following: (edit no longer possible as of https://github.com/src-d/hercules/pull/262/commits/c92aaacc8b5498a6d9f9b5699a2aed9ea904f85b - a wrapper needs to be written around zap)

package hercules_test

import (
    "go.uber.org/zap"
    gogit "gopkg.in/src-d/go-git.v4"
    "gopkg.in/src-d/hercules.v10"
)

func main() {
    repo, _ := gogit.PlainOpen(".")
    pipe := hercules.NewPipeline(repo)
    pipe.Initialize(map[string]interface{}{
                // custom logger
        hercules.ConfigLogger: zap.NewExample().Sugar(),
    })
}

I've added a Logger and Config variable for facts in package internal/core, and also added the logger to all pipeline items and replaced the log calls that i could find.

There are still a few calls to panic that I think we should try to remove in favour of error, but I noticed some tests actually assert that panics occur so I've tried not to change too much for now (notably BurndownAnalysis)

bobheadxi commented 5 years ago

@vmarkovtsev there appears to be an issue with labours even though nothing outward-facing has changed with this PR - do you have any ideas as to why this might be happening?

https://travis-ci.com/src-d/hercules/jobs/188673981#L2785

vmarkovtsev commented 5 years ago

Yes, the problem is with Google's ortools, which is a dependency of seriate, which is a dependency of labours. They released v7 on March 26th which is not backward compatible, and I did not specify the upper version constraint in seriate. Fixing this now.

bobheadxi commented 5 years ago

Thanks @vmarkovtsev looks like the build is passing now!

vmarkovtsev commented 5 years ago

Reviewing this tomorrow :+1:

bobheadxi commented 5 years ago

@vmarkovtsev real bugs should still propagate errors upwards for the caller to deal with - for example, using Hercules as a library in a web service, a user would definitely not want a fatal crash every time, but instead log and handle it safely, or at least get the chance to gracefully stop all connections before crashing. If it’s a service with state and buffers that needs to be written to disk, for example, a panic has a real danger of causing those things to be lost

bobheadxi commented 5 years ago

If done right, returning errors will do the same thing as panic, since it’ll return up to the caller (for example, the Hercules command) and a panic can be done there, or simply exit with 1. Logger can be configured to capture stacktraces no errors as well

vmarkovtsev commented 5 years ago

This is the thing, the places which I marked (I did not mark everything) are assertions which should never happen. The same as C++ assert or accessing a slice element out of bounds. If Go had macros, these would be disabled at all in the release build.

bobheadxi commented 5 years ago

Hm... even then I believe it still better to propagate it properly, since sometimes unrelated things can cause certain impossibilities to happen, even if it doesn’t seem like it (at least it seems to be what happened here: https://github.com/src-d/hercules/pull/234#issuecomment-471200837)

Also feels good for the sake of consistency - out of bounds is one thing, but an unexpected invariant change feels like it should be returned properly at least based on what I understand of Go best practices, for the cases I mentioned earlier (in the event something goes wrong, even if it’s probably impossible, you want to give the caller the chance to clean up properly after itself, which could be critical - a panic interrupting a database write is less than pleasant)

If you feel strongly about keeping the panics I suppose I can change them back, but I would personally strongly recommend moving towards reducing the panics in the library component of Hercules (would be very helpful for me 😅)

vmarkovtsev commented 5 years ago

Regarding catching panics in a lib, there is always recover() at your service.

vmarkovtsev commented 5 years ago

I know your pain, I used to suffer the same way when I used enry to detect languages, and it used to be very buggy and paniced all the time when used concurrently. After all, I could report proper panic stack traces to the maintainers, and they were able to fix problems very fast. I do understand that we can change those logging calls to include stack traces. If you know how to do it (print a fancy stack trace when a 🐛 appears), to me it is the same as panicking. So I am fine with that actually.

bobheadxi commented 5 years ago

Thanks! I actually did not consider recover at all, so thank you for pointing that out. I’ll look into both solutions (log stacktrace + return error vs panic) and see might be best for these cases, and make the necessary changes. I’ll apply fixes for your other comments as well.

Thanks for the review!

bobheadxi commented 5 years ago

@vmarkovtsev I've added a stacktrace capture to all calls to Error and Errorf. The output looks like this:

[ERROR] 2019/04/01 17:12:39 hello-world
[ERROR] 2019/04/01 17:12:39 stacktrace:
gopkg.in/src-d/hercules.v10/internal/core.TestLogger(0xc000178300)
    /Users/robertlin/go/src/gopkg.in/src-d/hercules.v10/internal/core/logger_test.go:50 +0xbf4
testing.tRunner(0xc000178300, 0x170a140)
    /usr/local/Cellar/go/1.12/libexec/src/testing/testing.go:865 +0xc0
created by testing.(*T).Run
    /usr/local/Cellar/go/1.12/libexec/src/testing/testing.go:916 +0x357

does that look okay to you?

vmarkovtsev commented 5 years ago

Yep, but let's make a separate level for such logs, e.g. "critical" or "fatal".

bobheadxi commented 5 years ago

Hm, is there a particular reason why? My thoughts:

adding Critical() breaks drop-in compatibility with many popular loggers (notably zap and logrus)
adding Fatal() means we either follow standard behaviour of calling os.Exit, which means we're back to panic-like behaviour, or not calling os.Exit and introducing a behavioural inconsistency with popular loggers (zap, logrus, standard library all exit on Fatal)

Another thought: zap, by default, captures stacktraces on Error-level logs and above, so capturing stacktraces on Error() isn't out of the ordinary:

https://github.com/uber-go/zap/blob/master/config.go#L67

    // DisableStacktrace completely disables automatic stacktrace capturing. By
    // default, stacktraces are captured for WarnLevel and above logs in
    // development and ErrorLevel and above in production.
    DisableStacktrace bool `json:"disableStacktrace" yaml:"disableStacktrace"`

Besides, a stacktrace should be interesting for all instances of l.Error right? Would be interested in hearing your thoughts on this

vmarkovtsev commented 5 years ago

Most of the errors do not require printing the trace and the log will be dumped otherwise.

bobheadxi commented 5 years ago

I've added Critical etc in https://github.com/src-d/hercules/pull/262/commits/c92aaacc8b5498a6d9f9b5699a2aed9ea904f85b

bobheadxi commented 5 years ago

I've added some additional tests in 6d09c8d

some drops are caused by the panic removals, where I've added a log and a return (example, example), I don't think these are really testable

src-d / hercules

Add pluggable Logger for pipeline #262