Open NiharDudam opened 2 years ago
Workaround is to delete and re-create the cluster
I think this should be: Workaround is to delete and recreate teh subscription. Not the whole cluster.
Thanks for pointing it out, just edited my comment.
any updates here?
@NiharDudam the error you are hitting comes from OLMs preflight checks while updating CRDs. It essentially checks the compatibility of the new CRD against existing CRs on the cluster to prevent data loss. The check failed in your case and that's why the upgrade was aborted. You'd either need to make these fields optional in your new CRD or adjust the CRs on the cluster.
You'd either need to make these fields optional in your new CRD
The CRD fields are optional and that is the same reason why kubectl
allows my kubectl apply -f new-crds.yaml
afaik.
I could perform kubectl apply -f new-crds.yaml
but somehow OLM is not honoring that? Remember these CRDs are internal objects so those are not part of CSV
@dmesser - I am assuming that pre-flight checks are validating existing CR's against new CRD's as per the error message so I would assume it would ignore those fields because I have manually set preserveUnknownFields: false
in my new CRD.
Not exactly sure what validation pre-flight checks are doing? Can you explain?
If it is basic schema validation, I would assume kubectl
client would also not ALLOW applying new CRDs which is not happening in my case.
@NiharDudam and I did some experimentation and found the issue, which we need to discuss.
Evidently, the REST API and older versions of Kubectl (1.20 and previous) allow you to persist a null
value in any field (Object, Array, Primitive...).
In kube 1.19, when you upgraded a CRD from v1beta1
to v1
, these null
values would be preserved when you attempt to retrieve these objects for those objects that are declared in the v1
schema.
Example:
spec:
myvalue: null
In kube 1.20 and later, when the CRD is upgraded from v1beta
to v1
, ALL null values are removed in favor of empty objects:
Example: v1 CRD:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
...
spec:
preserveUnknownFields: false
...
versions:
- name: v1
schema:
openAPIV3Schema:
type: object
properties:
...
spec:
type: object
properties:
myvalue:
type: object
...
Version persisted with v1beta1 CRD with the API or kubectl 1.19
spec:
myvalue: null
When v1 CRD is applied, this value now becomes:
spec:
myvalue: {}
So, it appears there is a toleration for converting null values to empty objects in the Kubernetes conversion logic, however, the validator that OLM is using does not tolerate this.
The crunchydata operator, unfortunately does this by default. It's using null
vs. default values (e.g. empty object, false, empty string) which are tolerated with the kubectl and API server, but not the validator code.
The workaround is to remove any null values from the CRs prior to upgrading the operator (the customer would need to do this, or there would need to be an intermediate version of the operator that would need to run to repair any CRs prior to the version that upgrades the CRD to v1. It also MAY be a workaround to add nullable: true
to the schema as long as the Operator truly can tolerate null values and isn't assuming that Kube is converting them (which it doesn't in 1.19 / OCP 4.6).
An alternative we should discuss is
@cdjohnson interesting find. @NiharDudam the reason you don't get any errors with kubectl
is that there is no validation of existing CRs that take places. This is an OLM feature that is meant to protect you from data loss. In this particular case it seems like we could do something to automatically remediate incompatibilities without data loss but it also seems the validation could just be enhanced to tolerate null values.
The code in question here is in the Catalog Operator: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/catalog/operator.go#L1829
It invokes the Kubernetes validation
go library:
https://github.com/operator-framework/operator-lifecycle-manager/blob/40b8b7104e3ce577c1c0f0f08e6854d62978169c/vendor/k8s.io/apiextensions-apiserver/pkg/apiserver/validation/validation.go#L47
I will try to reproduce this and investigate how k8s apiserver is handling this particular situation.
Bug Report
What did you do?
Hi everyone, I am facing an issue with operator upgrades where my operator upgrade is stuck with an error in my Subscription UI page as show here.
Quick context:
apiVersion: v1beta1
and hadpreserveUnknownFields: true
apiVersion: v1
and I manually setpreserveUnknownFields: false
in my bundle due to https://github.com/kubernetes-sigs/controller-tools/issues/476I am thinking the problem is because OLM is doing some checks during upgrades which are different than fresh installs and causing upgrades to stuck.
I tested basic upgrades in base Kubernetes and it works as possible (Install old CRD's + Install old operator + update CRDs + update operator)
I am hoping if someone can shed some light if this is a known issue with OLM
What did you expect to see? Upgrade completes
What did you see instead? Under which circumstances? upgrades stuck
Environment Openshift 4.7 and openshift 4.8
Possible Solution Workaround is to delete and re-create the operator
Additional context I already raised this issue in #olm-dev channel at https://kubernetes.slack.com/archives/C0181L6JYQ2/p1637004932140400 but did not get precise answers.