Better story for Build logs

knative / build

A Kubernetes-native Build resource.

Apache License 2.0

575 stars 159 forks source link

Better story for Build logs #9

Open mattmoor opened 6 years ago

mattmoor commented 6 years ago

Currently accessing Build logs is a poor experience in (at least) two ways.

Accessing any logs currently requires me to break encapsulation and peel your way through the Job to the Pod running, and then access the logs of the relevant init container.

For failed steps, this is less than ideal, but works. For successful steps, you get a container not-found error (I guess it's aggressively cleaned up).

mattmoor commented 6 years ago

I wonder if for logs having a fluentd daemonset is enough?

mdemirhan commented 6 years ago

We should tie this with the overall logging & monitoring story (which is very much in flux at the moment - we have a meeting scheduled next week with Pivotal to discuss this further), but to unblock this, a fluentd daemonset or a fluentd sidecar container + some fixed rules to collect build logs and sending them to an Elastic Search cluster (or any other endpoint) is possible. I can take a stab at this one - let me know what you think.

mattmoor commented 6 years ago

@mdemirhan Is this something that should be covered by your work thus far?

mdemirhan commented 6 years ago

Yes, it should be. Current policy is setup to collect all build-controller container logs. However; this is not sufficient in this case I am assuming. I will take a look at build-crd code to see what containers I should watch and integrate.

mattmoor commented 6 years ago

So is this done once we merge HEAD into Elafros?

mdemirhan commented 6 years ago

Yes, this is complete as far as getting build logs into ElasticSearch. I tested this with the current test cases for build and also tested this with an init container that crashes. We are getting logs correctly. We probably should add a small paragraph explaining how to get build logs. Any recommendations for the location of that small paragraph?

However; our overall debugging story is still a little complex. We need to have a clear guide on how to debug issues (something failed - where do I start - step by step guide). I will create an uber issue that tracks the entire debugging experience.

knative-housekeeping-robot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.\n If this issue is safe to close now please do so with /close.\n Send feedback to Knative Productivity Slack channel or knative/test-infra. /lifecycle stale