Open bo0ts opened 2 years ago
When a CSV fails, there is a way to mark errors as unrecoverable versus a recoverable failure. There is a small list of unrecoverable failures but most are recoverable. To solve this, the unrecoverable list should be updated to included cases where an immutable field is attempted to be updated during the course of an upgrade. If OLM doesn't encounter an unrecoverable error when installing the CSV it will always continue to try to install it.
Updating an operator that includes a change to an immutable field would require one to remove the existing version of the operator before attempting to install the newer version. Since OLM does patch updates, it cannot successfully install the newer version.
@exdx I'm not sure I agree. The immutable field
error is a classic and even part of the troubleshooting documentation . Retrying here is perfectly fine for me, because it is an issue that has to be resolved manually during installation and can be done easily in most cases (just remove the offending object and let it be recreated by the operator installation - instead of removing the entire operator).
My problem is the way OLM actually retries
and that is does not back-off after multiple failures.
Bug Report
What did you do?
1.32.0
to1.33.0
(using the community-operator-index:v4.9) on an OpenShift 4.9 clusterinstall strategy failed: Deployment.apps "jaeger-operator" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/name":"jaeger-operator", "name":"jaeger-operator"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
What did you expect to see?
I did expect the installation to back-off from attempts exponentially and the cluster to remain stable.
What did you see instead? Under which circumstances?
The flood of installation attempts led to etcd timeouts and failures during leader election leading to multiple restarts of other operators and further failures. The default OpenShift API Fairness and Priority rules did not prevent this from happening.
Environment
4.9-stable
)4.9.0-0.okd-2022-02-12-140851
v1.22.1-1839+b93fd35dd03051-dirty