kabanero-io / kabanero-operator

The Kabanero Operator. This repo will be archived soon.
Apache License 2.0
43 stars 26 forks source link

Consider approaches for tracing and log levels #370

Open kaczyns opened 4 years ago

kaczyns commented 4 years ago

There have been some situations recently that would have benefited (in my opinion) by a better or different tracing scheme. They include:

The specific component here was HTTP interaction with github.com. It would have been good to see the bytes of the request coming back, which would have made it easier to realize that we were not reading the entire response.

We should consider a way to enable tracing, either globally or by component, and a syntax for describing what specific tracing should be enabled.

I'll leave my thoughts in a separate comment.

kaczyns commented 4 years ago

OK the warning part first... the logger used by the operator (I'm not sure offhand about the admission webhook or collection controller) is a "zap" logger. I didn't know what this was until recently... the zap logger is a log implementation which appears to have been created by Uber. It allows log messages to be written in JSON format, at various levels. I am sure I am not doing it justice with this brief description but that is as deep as I went. Someone has developed a "go-logr" wrapper around the zap logger. The go-logr is written for simplicity, and on the surface you only get to pick Info or Error messages when you write to it. However, it does contain a method which allows logging at an arbitrary level, and it turns out that the "zap" logger supports levels as well. So you can write a warning message like this:

    reqLogger.V(-1).Info("This is supposed to be a warning message...")

and the message comes out like this:

{"level":"warn","ts":1577981468.9446123,"logger":"controller_kabaneroplatform","msg":"This is supposed to be a warning message..."}

Now the -1 log level is actually a constant - zap logger defines levels -1 thru some positive number... but the go-logr wrapper around the zap logger uses the inverse of the log level to indicate the log level. So the zap logger constant WarnLevel refers to the number 1, you have to specify -1 on the go-logr wrapper to get a warning message.

I can see Kabanero having a log wrapper that defines the three methods Info, Error, Warn (maybe more) that can defer to the base implementation for Info and Error, and then handle the V(-1) nonsense for Warn.


My thoughts on tracing always go back to Open Liberty, where I've spent the most time writing code. In Liberty you can enable tracing per component, and then by detail level within that component. That kind of granularity is probably overkill for the Kabanero operator at this point. I think at the bare minimum we have to let the user turn tracing on at all, and then after that it would be nice if we had a list of components, or packages, that would let them turn on just a portion of the tracing.

I'm not really sure what the best way to turn the tracing on is. Some thoughts that I had:

I'm not sure if it makes sense to write the traces to a separate file. I'm not even sure Kubernetes can retrieve multiple log files. I know it's possible to use 3rd party tools to do it, but we want to this to be easy for a user to get at the logs if they want to, without using 3rd party tools.

The zap logger and it's go-logr wrapper appear to support log filtering based on level. So if we set up a logger with just the Info level, it would automatically ignore any traces sent to it. Someone would need to investigate whether it's possible to change that level later (after someone enables tracing).

kaczyns commented 4 years ago

Just as another data point (I'm not suggesting this is what we should do, just providing an example), the OLM operators pass a -debug argument when the container starts. This is set in the pod specification in the deployment for the operator. In order to set it, you have to edit the deployment, which of course is being managed by another operator, so to set it, you have to disable that other operator. However the idea of using a flag is interesting. I'm just not sure how we would set it.

Another idea - the Subscription has a config section that can be used to pass environment variables to deployments that are created by the subscription. So perhaps this is an opportunity for us to pass in some value. We would need to play around with it and see if one updates the subscription, if OLM will then alter the deployment to reflect the current subscription config, or if this is just a "one time" configuration. See: https://github.com/operator-framework/operator-lifecycle-manager/blob/fc91161aee73364c9f5f8a206e01fbf4281a9ab4/doc/design/subscription-config.md