Open mattmoor opened 6 years ago
I wonder if for logs having a fluentd daemonset is enough?
We should tie this with the overall logging & monitoring story (which is very much in flux at the moment - we have a meeting scheduled next week with Pivotal to discuss this further), but to unblock this, a fluentd daemonset or a fluentd sidecar container + some fixed rules to collect build logs and sending them to an Elastic Search cluster (or any other endpoint) is possible. I can take a stab at this one - let me know what you think.
@mdemirhan Is this something that should be covered by your work thus far?
Yes, it should be. Current policy is setup to collect all build-controller container logs. However; this is not sufficient in this case I am assuming. I will take a look at build-crd code to see what containers I should watch and integrate.
So is this done once we merge HEAD into Elafros?
Yes, this is complete as far as getting build logs into ElasticSearch. I tested this with the current test cases for build and also tested this with an init container that crashes. We are getting logs correctly. We probably should add a small paragraph explaining how to get build logs. Any recommendations for the location of that small paragraph?
However; our overall debugging story is still a little complex. We need to have a clear guide on how to debug issues (something failed - where do I start - step by step guide). I will create an uber issue that tracks the entire debugging experience.
Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.\n If this issue is safe to close now please do so with /close.\n Send feedback to Knative Productivity Slack channel or knative/test-infra. /lifecycle stale
Currently accessing Build logs is a poor experience in (at least) two ways.
Accessing any logs currently requires me to break encapsulation and peel your way through the
Job
to thePod
running, and then access the logs of the relevant init container.For failed steps, this is less than ideal, but works. For successful steps, you get a container not-found error (I guess it's aggressively cleaned up).