Open philomory opened 2 years ago
Looks like we are running into this error 2years later with rancher 2.7.1. Having 1 downed Cluster block the whole process is what we were trying to circumvent with fleet. Any idea or timeline? Currently i have to change a selector to remove the cluster from the group.
This might be related to default values in the rollout strategy. The defaults are documented in the fleet.yaml reference.
Let's test if this still happens on 2.9.1
It seems to still be happening on 2.9.1-rc3
.
Adding some notes about how is it observed on this version:
On step 7 state is either Not Ready
or Modified
. Nevertheless, an error message is displayed:
Error log:
Modified(3) [Bundle repo-r-test-bundle]; deployment.apps test-bundle/test modified {"spec":{"template":{"spec":{"containers":[{"image":"paulbouwer/hello-kubernetes:1.10.1","imagePullPolicy":"IfNotPresent","lifecycle":{"preStart":{"exec":{"command":["sleep","2"]}}},"name":"test","resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}]}}}}
After step 10 State is Wait Applied
directly:
After 'fix' from step 12 state of Git Repo is Wait Applied
, yet, similar result to the originally described happens:
Error:
WaitApplied(1) [Bundle repo-r-test-bundle]; deployment.apps test-bundle/test modified {"spec":{"template":{"spec":{"containers":[{"image":"paulbouwer/hello-kubernetes:1.10.1","imagePullPolicy":"IfNotPresent","lifecycle":{"preStart":{"exec":{"command":["sleep","2"]}}},"name":"test","resources":{},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}]}}}}
When working on this we should
If any clusters are offline/unavailable, the status of Bundles that get deployed to those clusters can get stuck with misleading/confusing error messages.
Steps to reproduce:
Create a git repository containing the following code:
ErrApplied
, with an error message similar toerror validating "": error validating data: ValidationError(Deployment.spec.template.spec.containers[0].lifecycle): unknown field "preStart" in io.k8s.api.core.v1.Lifecycle'
.Error validating "": error validating data: ValidationError(Deployment.spec.template.spec.containers[0].lifecycle): unknown field "preStart" in io.k8s.api.core.v1.Lifecycle
, even though the actual repository no longer contains any reference to apreStart
field.It is worth noting that, if step 10 is skipped - so that the commit in step 12 (which fixes the error) is the first commit to the repo after cluster A goes offline - then in step 12 the BundleDeployment for A will go to a "Wait Applied" state rather than being stuck in the error state.