kubernetes-sigs / prow

Prow is a Kubernetes based CI/CD system developed to serve the Kubernetes community. This repository contains Prow source code and Hugo sources for Prow documentation site.
https://docs.prow.k8s.io
Apache License 2.0
129 stars 99 forks source link

`deck`: don't serve the ready endpoint until immediately prior to starting the server #308

Closed smg247 closed 2 weeks ago

smg247 commented 4 weeks ago

We are noticing brief downtime when deck restarts. I am not convinced that the logic in between serving the ready endpoint and actually starting the server is long enough to fully account for this, but it certainly isn't helping.

netlify[bot] commented 4 weeks ago

Deploy Preview for k8s-prow ready!

Name Link
Latest commit a1838777831c1b4b589015c7c5d7adc7acc80d46
Latest deploy log https://app.netlify.com/sites/k8s-prow/deploys/6728f2d5243cb40007e67575
Deploy Preview https://deploy-preview-308--k8s-prow.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

smg247 commented 4 weeks ago

/retest

smg247 commented 3 weeks ago

/cc @petr-muller

smg247 commented 3 weeks ago

/retest

petr-muller commented 3 weeks ago

Wait, this is not a flake :thinking:

Even passing jobs have:

error: .status.conditions accessor error: <nil> is of the type <nil>, expected []interface{}

That's not the thing failing the job, I think. It's just highlighted. See this though:

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_prow/308/pull-prow-integration/1849815795387863040#1:build-log.txt%3A999

deck-tenanted failed to get ready
NAME                                       READY   STATUS             RESTARTS      AGE
...
deck-78964df6ff-tsl5c                      1/1     Running            0             18m
deck-tenanted-6c864f4c76-r5cnq             0/1     Running            0             18m 

/lgtm cancel

smg247 commented 3 weeks ago

ah, interesting. I had the integration test fail for me a few times in recent PRs and then suddenly pass. That highlighted line is definitely confusing.

I guess this is now failing due to deck-tenanted's cookie-secret. Maybe that is the reason that code was put after the ready signal, to make the integration test pass....

EDIT: I pushed a new commit to remove that, let's see if that fixes the test and doesn't break anything else..

EDIT 2: Still fails, I will pick this up later, but this failing integration test leads me to think that this is actually the cause of our issues after all

smg247 commented 2 weeks ago

/label tide/merge-method-squash

smg247 commented 2 weeks ago

The deck_deployment.yaml had its cookie-secret removed in https://github.com/kubernetes-sigs/prow/commit/7dd9cad2b5bcf0a5475ee17ea823acb08718a0c7, and allow-insecure was also added there. let's see if doing the same here fixes the test.

smg247 commented 2 weeks ago

/retest

smg247 commented 2 weeks ago

Fixed now, I had to remove the cookie-secret and the oauth-url (matching what the deck_deployment.yaml has) in order to get it to start up.

petr-muller commented 2 weeks ago

No mention about the removals in the original https://github.com/kubernetes/test-infra/pull/27038 that brought 7dd9cad2b5bcf0a5475ee17ea823acb08718a0c7

/shrug

k8s-ci-robot commented 2 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, smg247

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/prow/blob/main/OWNERS)~~ [petr-muller] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
oliver-goetz commented 2 weeks ago

This PR breaks our deck deployment (deck_deployment.yaml). It is stuck here:

❯ k logs -n prow deck-85d46cf868-fr4lf -f
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer buildlog with title Build Log."
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer coverage with title Coverage."
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer html with title HTML."
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer junit with title JUnit."
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer links with title Debugging links."
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer metadata with title Metadata."
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer podinfo with title Job Pod Info."
time="2024-11-05T16:08:13Z" level=info msg="Spyglass registered viewer restcoverage with title REST API coverage report."
{"component":"deck","file":"sigs.k8s.io/prow/pkg/kube/config.go:77","func":"sigs.k8s.io/prow/pkg/kube.LoadClusterConfigs","level":"info","msg":"Loading cluster contexts...","severity":"info","time":"2024-11-05T16:08:13Z"}
{"component":"deck","file":"sigs.k8s.io/prow/pkg/kube/config.go:47","func":"sigs.k8s.io/prow/pkg/kube.kubeConfigs","level":"info","msg":"Parsed kubeconfig context: gardener-prow-build","severity":"info","time":"2024-11-05T16:08:13Z"}
{"component":"deck","file":"sigs.k8s.io/prow/pkg/kube/config.go:47","func":"sigs.k8s.io/prow/pkg/kube.kubeConfigs","level":"info","msg":"Parsed kubeconfig context: gardener-prow-trusted","severity":"info","time":"2024-11-05T16:08:13Z"}
{"client":"github","component":"deck","file":"sigs.k8s.io/prow/pkg/github/client.go:801","func":"sigs.k8s.io/prow/pkg/github.(*client).log","level":"info","msg":"Throttle(0, 0, [])","severity":"info","time":"2024-11-05T16:08:15Z"}
{"Lens":"metadata","component":"deck","file":"sigs.k8s.io/prow/pkg/spyglass/lenses/common/common.go:66","func":"sigs.k8s.io/prow/pkg/spyglass/lenses/common.NewLensServer","level":"info","msg":"Adding handler for lens","severity":"info","time":"2024-11-05T16:08:15Z"}
{"Lens":"buildlog","component":"deck","file":"sigs.k8s.io/prow/pkg/spyglass/lenses/common/common.go:66","func":"sigs.k8s.io/prow/pkg/spyglass/lenses/common.NewLensServer","level":"info","msg":"Adding handler for lens","severity":"info","time":"2024-11-05T16:08:15Z"}
{"Lens":"junit","component":"deck","file":"sigs.k8s.io/prow/pkg/spyglass/lenses/common/common.go:66","func":"sigs.k8s.io/prow/pkg/spyglass/lenses/common.NewLensServer","level":"info","msg":"Adding handler for lens","severity":"info","time":"2024-11-05T16:08:15Z"}
{"Lens":"coverage","component":"deck","file":"sigs.k8s.io/prow/pkg/spyglass/lenses/common/common.go:66","func":"sigs.k8s.io/prow/pkg/spyglass/lenses/common.NewLensServer","level":"info","msg":"Adding handler for lens","severity":"info","time":"2024-11-05T16:08:15Z"}
{"Lens":"podinfo","component":"deck","file":"sigs.k8s.io/prow/pkg/spyglass/lenses/common/common.go:66","func":"sigs.k8s.io/prow/pkg/spyglass/lenses/common.NewLensServer","level":"info","msg":"Adding handler for lens","severity":"info","time":"2024-11-05T16:08:15Z"}
{"Lens":"links","component":"deck","file":"sigs.k8s.io/prow/pkg/spyglass/lenses/common/common.go:66","func":"sigs.k8s.io/prow/pkg/spyglass/lenses/common.NewLensServer","level":"info","msg":"Adding handler for lens","severity":"info","time":"2024-11-05T16:08:15Z"}

and the pod is never getting ready.

petr-muller commented 2 weeks ago

@oliver-goetz thanks for the report, and sorry for the breakage. I have opened a revert.

oliver-goetz commented 2 weeks ago

No worries, thanks @petr-muller.