project-codeflare / codeflare-operator

Operator for installation and lifecycle management of CodeFlare distributed workload stack
Apache License 2.0
7 stars 41 forks source link

Add sleep before creating Kueue resources #610

Closed jiripetrlik closed 3 weeks ago

jiripetrlik commented 3 weeks ago

Fixes following errors. Service for webhook is not available immediately: error appears Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "mresourceflavor.kb.io": failed to call webhook: Post "https://kueue-webhook-service.kueue-system.svc:443/mutate-kueue-x-k8s-io-v1beta1-resourceflavor?timeout=10s": dial tcp 10.96.115.38:443: connect: connection refused

openshift-ci[bot] commented 3 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sutaakar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/project-codeflare/codeflare-operator/blob/main/OWNERS)~~ [sutaakar] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
dgrove-oss commented 3 weeks ago

This is how we handle this in the appwrapper CI. Kueue pod won't report ready until the webhooks are up.

https://github.com/project-codeflare/appwrapper/blob/main/hack/deploy-kueue.sh#L22-L28

sutaakar commented 3 weeks ago

We use the same code :) Though for some reason occasionally we get webhook error.