stolostron / multicloud-operators-subscription

Fork of https://github.com/open-cluster-management-io/multicloud-operators-subscription
Apache License 2.0
47 stars 41 forks source link

Appsubs fail to apply CRDs: schemas are required #170

Closed ahadas closed 4 years ago

ahadas commented 4 years ago

We've defined an appsub that applies CRD with:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  labels:
    operator.kubevirt.io: ""
  name: kubevirts.kubevirt.io
spec:
  additionalPrinterColumns:
  - JSONPath: .metadata.creationTimestamp
    name: Age
    type: date
  - JSONPath: .status.phase
    name: Phase
    type: string
  group: kubevirt.io
  names:
    categories:
    - all
    kind: KubeVirt
    plural: kubevirts
    shortNames:
    - kv
    - kvs
    singular: kubevirt
  scope: Namespaced
  version: v1alpha3
  versions:
  - name: v1alpha3
    served: true
    storage: true

It failed due to: E0421 17:05:01.346887 1 synchronizer.go:461] Failed to apply kind template&{map[apiVersion:apiextensions.k8s.io/v1 kind:CustomReso urceDefinition metadata:map[annotations:map[apps.open-cluster-management.io/hosting-deployable:ec/ec-channel-CustomResourceDefinition-ku bevirts.kubevirt.io apps.open-cluster-management.io/hosting-subscription:ec/cnv-appsub apps.open-cluster-management.io/sync-source:subgb k8s-ec/cnv-appsub] labels:map[operator.kubevirt.io:] name:kubevirts.kubevirt.io] spec:map[additionalPrinterColumns:[map[JSONPath:.metada ta.creationTimestamp name:Age type:date] map[JSONPath:.status.phase name:Phase type:string]] group:kubevirt.io names:map[categories:[all ] kind:KubeVirt plural:kubevirts shortNames:[kv kvs] singular:kubevirt] scope:Namespaced version:v1alpha3 versions:[map[name:v1alpha3 se rved:true storage:true]]]]}with error:CustomResourceDefinition.apiextensions.k8s.io "kubevirts.kubevirt.io" is invalid: spec.versions[0] .schema.openAPIV3Schema: Required value: schemas are required

by changing the versions to:

versions:
  - name: v1alpha3
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              cronSpec:
                type: string
              image:
                type: string
              replicas:
                type: integer

It works but it doesn't sound right to require this as k8s doesn't require it.

ianzhang366 commented 4 years ago

Hi @ahadas I believe this issue is related to the following,

https://www.openshift.com/blog/a-look-into-the-technical-details-of-kubernetes-1-16

It seems on k8s 1.16, the apiextensions.k8s.io/v1beta1 is promoted to apiextensions.k8s.io/v1. At v1, the k8s will request the schema field, I guess when doing the conversion from the old version to the newer ones, the schema field is required.

Checking the k8s documentation, it seems, at the v1beta1 example, the schema field was added to as well.

https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/

I guess we will have to make the migration following the k8s examples since we updated our k8s client to 1.16.2, https://github.com/open-cluster-management/multicloud-operators-subscription/blob/5240b7a3cd525a7eafc92d880ea656c88fcd0ceb/go.mod#L36

One more thing, I noticed that the CRD, you are showing, is at apiextensions.k8s.io/v1beta1 but the log is complaining at apiVersion:apiextensions.k8s.io/v1.

Am I missing something over here? I mean why there's version conversion, do you have any clue, or maybe this conversion is done by the k8s client?

ahadas commented 4 years ago

@ianzhang366 thanks for looking into this!

Unless I'm missing something, I don't see that the schema was added to the CRD of v1beta1 in the k8s documentation you pointed to:

# Deprecated in v1.16 in favor of apiextensions.k8s.io/v1
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  # name must match the spec fields below, and be in the form: <plural>.<group>
  name: crontabs.stable.example.com
spec:
  # group name to use for REST API: /apis/<group>/<version>
  group: stable.example.com
  # list of versions supported by this CustomResourceDefinition
  versions:
    - name: v1
      # Each version can be enabled/disabled by Served flag.
      served: true
      # One and only one version must be marked as the storage version.
      storage: true
  # either Namespaced or Cluster
  scope: Namespaced
...

And yeah, it seems like the version was converted by k8s (unrelated to OCM) as it happened also when it was defined with kubectl:

[admin@spoke1 deploy]$ kubectl create -f /tmp/crd.yaml 
customresourcedefinition.apiextensions.k8s.io/kubevirts.kubevirt.io created
[admin@spoke1 deploy]$ kubectl get crd kubevirts.kubevirt.io -o yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  creationTimestamp: "2020-04-21T18:16:11Z"
  generation: 1
  labels:
    operator.kubevirt.io: ""
  name: kubevirts.kubevirt.io
  resourceVersion: "812569"
  selfLink: /apis/apiextensions.k8s.io/v1/customresourcedefinitions/kubevirts.kubevirt.io
  uid: 3519ce49-0444-405e-83d0-b85a427f2dae
spec:
  conversion:
    strategy: None
  group: kubevirt.io
  names:
    categories:
    - all
    kind: KubeVirt
    listKind: KubeVirtList
    plural: kubevirts
    shortNames:
    - kv
    - kvs
    singular: kubevirt
  preserveUnknownFields: true
  scope: Namespaced
  versions:
  - additionalPrinterColumns:
    - jsonPath: .metadata.creationTimestamp
      name: Age
      type: date
    - jsonPath: .status.phase
      name: Phase
      type: string
    name: v1alpha3
    served: true
    storage: true
status:
...
ianzhang366 commented 4 years ago

I was looking at the field, validation under the CRD spec.

The validation field is added on the k8s 1.16, under the validation field it has the openAPIV3Schema, I guess, for k8s, it leverages the spec.validation for the CRD version conversion.

  validation:
    openAPIV3Schema:
      type: object
      properties:
        host:
          type: string
        port:
          type: string

https://v1-16.docs.kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definition-versioning/

To our case, I believe the flow is, When we apply an apiextensions.k8s.io/v1beta1 CRD via subscription, the CRD will be converted to apiextensions.k8s.io/v1 CRD by k8s, for the conversion to happen, the k8s will checking the spec.versions and failed us on the missing openAPIV3Schema.

However, on k8s 1.16 documentation, I'm seeing the following,

Note: In apiextensions.k8s.io/v1beta1, there was a version field instead of versions. The version field is deprecated and optional, but if it is not empty, it must match the first item in the versions field.

It seems we can only use the version field at apiextensions.k8s.io/v1beta1, however, if we decide to version and versions, the schema of versions[0] can be empty though.

    // +optional
    Schema *CustomResourceValidation

https://github.com/kubernetes/apiextensions-apiserver/blob/release-1.16/pkg/apis/apiextensions/types.go#L184-L185

Our code is basically using client-go to apply/create the CRD, not sure why it's somehow running the schema checking logic. It could be the case, our code is doing the version conversion before calling the apply/create.

Can you please try to moving forward via the workaround, while I'm figuring out the odd validation logic on the schema field?

Also, if we use `apiextensions.k8s.io/v1, would the problem persist in your case?

ahadas commented 4 years ago

Our code is basically using client-go to apply/create the CRD, not sure why it's somehow running the schema checking logic. It could be the case, our code is doing the version conversion before calling the apply/create.

Yeah, could be.

Can you please try to moving forward via the workaround, while I'm figuring out the odd validation logic on the schema field?

We tried to move forward with that workaround but then we faced #166 again for which we don't have a workaround... Btw, are there functional tests where this flow can be tested with our CRDs to make sure it doesn't break on them?

Also, if we use `apiextensions.k8s.io/v1, would the problem persist in your case?

Yes, got the same error.

ianzhang366 commented 4 years ago

From my env with verbose log, I'm seeing that we are changing the GVK before we create the CRD via client-go.

The flow is the following,

  1. subscription controller will build up with a list of GVK(which reflecting the update cluster GVK) via the discovery code.
  2. then we will use this GVK list for GVK conversion if there's out of date GVK version

For our case, the GVK list has I0422 10:56:07.356986 96541 discovery.go:239] Start watching kind: CustomResourceDefinition, resource: apiextensions.k8s.io/v1, Resource=customresourcedefinitions objects in it: 0

Due to the above, we can tell that v1beta1 is out of date, so there will be a conversion.

I0422 10:56:16.277761   96541 discovery.go:116] gk: schema.GroupKind{Group:"apiextensions.k8s.io", Kind:"CustomResourceDefinition"}, valid: &schema.GroupVersionKind{Group:"apiextensions.k8s.io", Version:"v1beta1", Kind:"CustomResourceDefinition"} 
I0422 10:56:16.277784   96541 synchronizer.go:625] the validgvk -------------------------> apiextensions.k8s.io/v1, Kind=CustomResourceDefinition

At this point, we just set the new GVK to our to-be-created CRD,

https://github.com/open-cluster-management/multicloud-operators-subscription/blob/a6652d5c409a4f81d7285793e35ab558480cabbd/pkg/synchronizer/kubernetes/synchronizer.go#L625

I0422 10:56:16.277943   96541 synchronizer.go:733] Registered template &{map[apiVersion:apiextensions.k8s.io/v1 kind:CustomResourceDefinition metadata:map[annotations:map[apps.open-cluster-management.io/hosting-deployable:crd-deploy/github-crd-CustomResourceDefinition-kubevirts.kubevirt.io apps.open-cluster-management.io/hosting-subscription:crd-deploy/crd-github apps.open-cluster-management.io/sync-source:subgbk8s-crd-deploy/crd-github] labels:map[operator.kubevirt.io:] name:kubevirts.kubevirt.io] spec:map[additionalPrinterColumns:[map[JSONPath:.metadata.creationTimestamp name:Age type:date] map[JSONPath:.status.phase name:Phase type:string]] group:kubevirt.io names:map[categories:[all] kind:KubeVirt plural:kubevirts shortNames:[kv kvs] singular:kubevirt] scope:Namespaced version:v1alpha3 versions:[map[name:v1alpha3 served:true]]]]}to KubeResource map:apiextensions.k8s.io/v1, Kind=CustomResourceDefinitionfor source: subgbk8s-crd-deploy/crd-github

These conversion causes the issue for us, since or apiextensions.k8s.io/v1 it requires the schema field and the to-be-created CRD doesn't have such.

This conversion approach needs to handle bettter. what we want might be

  1. we should reuse the conversion logic from the k8s API server... or
  2. skip these conversions and let k8s API server handle it...

For which way to go, I will need to check up with the team.

As a workaround, I tried to use v1 to deploy your CRD, I don't see the error you mentioned at the other issue tho.

Here's my yaml samples, https://github.com/ianzhang366/acm-applifecycle-samples/tree/master/crd-deploy (FYI, the app-deploys is the deploy related yaml, payload is your CRD)

I deploy the app-deploys yamls on my hub, then checking at my spoke(managed) cluster, I can see the target CRD, which is under the payload folder of the github)

Can you please try out the workaround, while I'm working on improving the conversion logic?

ahadas commented 4 years ago

This conversion approach needs to handle bettter. what we want might be

  1. we should reuse the conversion logic from the k8s API server... or
  2. skip these conversions and let k8s API server handle it...

+1

As a workaround, I tried to use v1 to deploy your CRD, I don't see the error you mentioned at the other issue tho.

Oh, sorry - I didn't provide the full details on the other issue. So yeah, we can workaround this issue as mentioned before and then the CRD is applied successfully along with all other entities we need for the operator. The problem mentioned in #166 (i.e., Not a Kubernetes resource) happens on the CR. I see your repo include the resources that we managed to apply as well but doesn't include the CR

ianzhang366 commented 4 years ago
  1. CRD version issue

It's odd to me that the k8s API server is able to serve both v1 and v1beta1 version, however our code only server v1 version. To me, our code should reflect the full capacity of the k8s API server on a managed cluster.

Reading the code, I found the following, https://github.com/open-cluster-management/multicloud-operators-subscription/blob/a6652d5c409a4f81d7285793e35ab558480cabbd/pkg/synchronizer/kubernetes/discovery.go#L143

It seems we are using ServerPreferredResources to list the apiservice, ServerPreferredResources would only give us a single version per gk... that why we have to make the conversion.

I would prefer to use ServerResources to list all apiservice all versions of each GK.

  1. supported kind issue This is one ties to the CRD issue. When you use the workaround get the CRD deployed, it's the GVK list is not updated... so to our code, it doesn't know the new CRD is able to be served, till next GVK update cycle kicks in and update the list...

Currently, it's about 10mins... Not ideal, I found there's a way to force the GVK update.

I'm not sure why we didn't use the ServerResources to list GVK in the first place... so I will have to wait for the team's feedback before merge PR.

ianzhang366 commented 4 years ago

Hi @ahadas

Consulting with the team, @kuanf pointed out that, if we use the ServerResources, it will add more watch on the controller... which could be a very bad idea for caching.

He suggested, in the long run, we should still use the ServerPreferredResources but need to redo some caching logic. So I'm going to hold on to the PR for this issue.

For the workaround, let's use apiextensions.k8s.io/v1 and sort out the #166

I figure out #166 due to the CR is not retired/requeue. I posted my finding over #166

I was wrong about the CRD cache update interval, it's not 10mins, it's the sync-interval by default it's 1 minute.

https://github.com/open-cluster-management/multicloud-operators-subscription/blob/4c146908ef91cf8fb3f9da866c1c539f9f5334dd/cmd/manager/exec/options.go#L35

Let's try to focus on #166 for our current cases, I will keep working on this one to make the cache logic changes, to make sure the old and new version of the same resource is handled properly.

ianzhang366 commented 4 years ago

Hi @ahadas

With the fix we pushed last night, the old CRD version should also be able to apply to the managed cluster via subscription, please give it a try and let me know if you have any issues.

ahadas commented 4 years ago

Hi @ahadas

With the fix we pushed last night, the old CRD version should also be able to apply to the managed cluster via subscription, please give it a try and let me know if you have any issues.

We now get a different error:

E0428 12:25:14.402006       1 synchronizer.go:354] Failed to apply resource with error: CustomResourceDefinition.apiextensions.k8s.io "kubevirts.kubevirt.io" is invalid: spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion{apiextensions.CustomResourceDefinitionVersion{Name:"v1alpha3", Served:true, Storage:true, Schema:(*apiextensions.CustomResourceValidation)(0xc017bcc4b0), Subresources:(*apiextensions.CustomResourceSubresources)(nil), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}}: per-version schemas may not all be set to identical values (top-level validation should be used instead)
E0428 12:25:14.402086       1 synchronizer.go:461] Failed to apply kind template&{map[apiVersion:apiextensions.k8s.io/v1beta1 kind:CustomResourceDefinition metadata:map[annotations:map[apps.open-cluster-management.io/hosting-deployable:ec/ec-channel-CustomResourceDefinition-kubevirts.kubevirt.io apps.open-cluster-management.io/hosting-subscription:kubevirt/cnv-appsub apps.open-cluster-management.io/sync-source:subgbk8s-kubevirt/cnv-appsub] labels:map[operator.kubevirt.io:] name:kubevirts.kubevirt.io] spec:map[additionalPrinterColumns:[map[JSONPath:.metadata.creationTimestamp name:Age type:date] map[JSONPath:.status.phase name:Phase type:string]] group:kubevirt.io names:map[categories:[all] kind:KubeVirt plural:kubevirts shortNames:[kv kvs] singular:kubevirt] scope:Namespaced version:v1alpha3 versions:[map[name:v1alpha3 schema:map[openAPIV3Schema:map[properties:map[spec:map[properties:map[cronSpec:map[type:string] image:map[type:string] replicas:map[type:integer]] type:object]] type:object]] served:true storage:true]]]]}with error:CustomResourceDefinition.apiextensions.k8s.io "kubevirts.kubevirt.io" is invalid: spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion{apiextensions.CustomResourceDefinitionVersion{Name:"v1alpha3", Served:true, Storage:true, Schema:(*apiextensions.CustomResourceValidation)(0xc017bcc4b0), Subresources:(*apiextensions.CustomResourceSubresources)(nil), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}}: per-version schemas may not all be set to identical values (top-level validation should be used instead)

Would it be possible to add a test that tries to apply the KubeVirt resources? they are available here

ianzhang366 commented 4 years ago

Hi @ahadas

Just a follow-up, with fixes provided by @rokej , can we close this issue?