kubernetes-sigs / cluster-api-provider-nested

Cluster API Provider for Nested Clusters
Apache License 2.0
299 stars 65 forks source link

šŸ› Unable to create a VirtualCluster on k8 v1.20.2 #63

Open mazocode opened 3 years ago

mazocode commented 3 years ago

Problem

Virtual cluster does not deploy with k8 v1.20.2. Output from vc-manager:

{"level":"info","ts":1621457612.1333222,"logger":"clusterversion-controller","msg":"reconciling ClusterVersion..."} {"level":"info","ts":1621457612.1334903,"logger":"clusterversion-controller","msg":"new ClusterVersion event","ClusterVersionName":"cv-sample-np"} {"level":"info","ts":1621457635.4177175,"logger":"virtualcluster-webhook","msg":"validate create","vc-name":"vc-sample-1"} {"level":"info","ts":1621457635.4421399,"logger":"virtualcluster-controller","msg":"reconciling VirtualCluster..."} {"level":"info","ts":1621457635.4824774,"logger":"virtualcluster-webhook","msg":"validate update","vc-name":"vc-sample-1"} {"level":"info","ts":1621457635.511791,"logger":"virtualcluster-controller","msg":"a finalizer has been registered for the VirtualCluster CRD","finalizer":"virtualcluster.finalizer.native"} {"level":"info","ts":1621457635.5118568,"logger":"virtualcluster-controller","msg":"will create a VirtualCluster","vc":"vc-sample-1"} {"level":"info","ts":1621457635.53576,"logger":"virtualcluster-webhook","msg":"validate update","vc-name":"vc-sample-1"} {"level":"info","ts":1621457635.556264,"logger":"virtualcluster-controller","msg":"reconciling VirtualCluster..."} {"level":"info","ts":1621457635.5563915,"logger":"virtualcluster-controller","msg":"VirtualCluster is pending","vc":"vc-sample-1"} {"level":"info","ts":1621457638.3632772,"logger":"virtualcluster-controller","msg":"creating secret","name":"root-ca","namespace":"default-a4a766-vc-sample-1"} {"level":"info","ts":1621457638.400915,"logger":"virtualcluster-controller","msg":"creating secret","name":"apiserver-ca","namespace":"default-a4a766-vc-sample-1"} {"level":"info","ts":1621457638.4276915,"logger":"virtualcluster-controller","msg":"creating secret","name":"etcd-ca","namespace":"default-a4a766-vc-sample-1"} {"level":"info","ts":1621457638.4523375,"logger":"virtualcluster-controller","msg":"creating secret","name":"controller-manager-kubeconfig","namespace":"default-a4a766-vc-sample-1"} {"level":"info","ts":1621457638.485505,"logger":"virtualcluster-controller","msg":"creating secret","name":"admin-kubeconfig","namespace":"default-a4a766-vc-sample-1"} {"level":"info","ts":1621457638.5329306,"logger":"virtualcluster-controller","msg":"creating secret","name":"serviceaccount-rsa","namespace":"default-a4a766-vc-sample-1"} {"level":"info","ts":1621457638.562718,"logger":"virtualcluster-controller","msg":"deploying StatefulSet for master component","component":""} {"level":"error","ts":1621457638.5628488,"logger":"virtualcluster-controller","msg":"fail to create virtualcluster","vc":"vc-sample-1","retrytimes":3,"error":"try to deploy unknwon component: "} {"level":"info","ts":1621457638.5843189,"logger":"virtualcluster-webhook","msg":"validate update","vc-name":"vc-sample-1"} {"level":"info","ts":1621457638.6019728,"logger":"virtualcluster-controller","msg":"reconciling VirtualCluster..."} {"level":"info","ts":1621457638.6020927,"logger":"virtualcluster-controller","msg":"VirtualCluster is pending","vc":"vc-sample-1"}

The namespace and secrets were created but none of the statefulsets from the ClusterVersion.

What I did

git clone https://github.com/kubernetes-sigs/cluster-api-provider-nested.git
cd cluster-api-provider-nested/virtualcluster

Build kubectl-vc

make build WHAT=cmd/kubectl-vc
sudo cp -f _output/bin/kubectl-vc /usr/local/bin

Create new CRDs

(see https://github.com/kubernetes-sigs/cluster-api-provider-nested/issues/62)

cd pkg
controller-gen "crd:trivialVersions=true,maxDescLen=0" rbac:roleName=manager-role paths="./..." output:crd:artifacts:config=config/crds

Install CRD

kubectl create -f config/crds/cluster.x-k8s.io_clusters.yaml
kubectl create -f config/crds/tenancy.x-k8s.io_clusterversions.yaml
kubectl create -f config/crds/tenancy.x-k8s.io_virtualclusters.yaml

Create ns, rbac, deployment, ...

kubectl create -f config/setup/all_in_one.yaml

I've added eventsto the RBAC because of this:

{"level":"info","ts":1621388803.9796872,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"clusterversion-controller","source":"kind source: /, Kind="} E0519 01:46:43.981421 1 event.go:260] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"vc-manager-leaderelection-lock.16805486d7f96288", GenerateName:"", Namespace:"vc-manager", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(time.Location)(nil)}}, DeletionTimestamp:(v1.Time)(nil), DeletionGracePeriodSeconds:(int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"ConfigMap", Namespace:"vc-manager", Name:"vc-manager-leaderelection-lock", UID:"5c94eb36-66a2-437a-a10f-6fc651533e96", APIVersion:"v1", ResourceVersion:"96800211", FieldPath:""}, Reason:"LeaderElection", Message:"vc-manager-76c5878465-6tq8f_e49ead0e-85c4-43f6-bb44-e4f0820e8ee8 became leader", Source:v1.EventSource{Component:"vc-manager-76c5878465-6tq8f_e49ead0e-85c4-43f6-bb44-e4f0820e8ee8", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0213960fa5d0488, ext:18231381017, loc:(time.Location)(0x23049a0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0213960fa5d0488, ext:18231381017, loc:(time.Location)(0x23049a0)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(time.Location)(nil)}}, Series:(v1.EventSeries)(nil), Action:"", Related:(v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:vc-manager:vc-manager" cannot create resource "events" in API group "" in the namespace "vc-manager"' (will not retry!)

Create a new ClusterVersion

kubectl create -f config/sampleswithspec/clusterversion_v1_nodeport.yaml

Had to remove kind and apiVersion below controllerManager: to match the schema:

error: error validating "cv-sample-nb.yaml": error validating data: [ValidationError(ClusterVersion.spec.controllerManager): unknown field "apiVersion" in io.x-k8s.tenancy.v1alpha1.ClusterVersion.spec.controllerManager, ValidationError(ClusterVersion.spec.controllerManager): unknown field "kind" in io.x-k8s.tenancy.v1alpha1.ClusterVersion.spec.controllerManager]; if you choose to ignore these errors, turn validation off with --validate=false

Create a new VirtualCluster

kubectl vc create -f config/sampleswithspec/virtualcluster_1_nodeport.yaml -o vc.kubeconfig
Fei-Guo commented 3 years ago

I think the ClusterVersion crd schema introduces the problem. It is not a good idea to put StatefulSet schema directly in CRD Spec, we should define Pod spec instead.

To quickly workaround the problem, please change the ClusterVersion CRD schema to a much simpler version shown as below. I hope this can unblock you at least for now.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  creationTimestamp: null
  labels:
    controller-tools.k8s.io: "1.0"
  name: clusterversions.tenancy.x-k8s.io
spec:
  group: tenancy.x-k8s.io
  names:
    kind: ClusterVersion
    plural: clusterversions
  scope: Cluster
  validation:
    openAPIV3Schema:
      properties:
        apiVersion:
          type: string
        kind:
          type: string
        metadata:
          type: object
        spec:
          properties:
            apiServer:
              properties:
                metadata:
                  type: object
                service:
                  type: object
                statefulset:
                  type: object
              type: object
            controllerManager:
              properties:
                metadata:
                  type: object
                service:
                  type: object
                statefulset:
                  type: object
              type: object
            etcd:
              properties:
                metadata:
                  type: object
                service:
                  type: object
                statefulset:
                  type: object
              type: object
          type: object
        status:
          type: object
      type: object
  version: v1alpha1
status:
  acceptedNames:
    kind: ""
    plural: ""
  conditions: []
  storedVersions: []
Fei-Guo commented 3 years ago

Another attempt is to remove your controller-gen, the make script will download controller-gen0.3.0 which seems to be working fine previously.

You can check the make file to see more tricks for manipulating the crd, e.g:

# To work around a known controller gen issue
# https://github.com/kubernetes-sigs/kubebuilder/issues/1544
ifeq (, $(shell which yq))
    @echo "Please install yq for yaml patching. Get it from here: https://github.com/mikefarah/yq"
    @exit
else
    @{ \
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.apiServer.properties.statefulset.properties.spec.properties.template.properties.spec.properties.containers.items.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.controllerManager.properties.statefulset.properties.spec.properties.template.properties.spec.properties.containers.items.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.etcd.properties.statefulset.properties.spec.properties.template.properties.spec.properties.containers.items.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.apiServer.properties.statefulset.properties.spec.properties.template.properties.spec.properties.initContainers.items.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.controllerManager.properties.statefulset.properties.spec.properties.template.properties.spec.properties.initContainers.items.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.etcd.properties.statefulset.properties.spec.properties.template.properties.spec.properties.initContainers.items.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.apiServer.properties.service.properties.spec.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.controllerManager.properties.service.properties.spec.properties.ports.items.required[1]" protocol;\
    yq w -i config/crds/tenancy.x-k8s.io_clusterversions.yaml "spec.validation.openAPIV3Schema.properties.spec.properties.etcd.properties.service.properties.spec.properties.ports.items.required[1]" protocol;\
    }
endif
mazocode commented 3 years ago

Same result with controller-gen 0.3.0. I think the x-kubernetes-list-map-keys were added in 0.3.0 but at this time there was no validation in place. However, your workaround fixed the issue. Here is my first virtual cluster:

$ kubectl -n default-c16bb7-vc-sample-1 get all
NAME                       READY   STATUS    RESTARTS   AGE
pod/apiserver-0            1/1     Running   0          5m26s
pod/controller-manager-0   1/1     Running   0          4m59s
pod/etcd-0                 1/1     Running   0          5m50s

NAME                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
service/apiserver-svc   NodePort    10.90.147.83   <none>        6443:30133/TCP   5m26s
service/etcd            ClusterIP   None           <none>        <none>           5m50s

NAME                                  READY   AGE
statefulset.apps/apiserver            1/1     5m26s
statefulset.apps/controller-manager   1/1     5m
statefulset.apps/etcd                 1/1     5m50s

Is there a way to enforce a specific runtimeClassName for pods with the syncer? This woudl be great to enforce tolerations and a container runtime like kata for pods running on the super cluster.

mazocode commented 3 years ago

Forgot a make manifests... works fine with controller-gen 0.3.0 and the workaround too :)

Fei-Guo commented 3 years ago

Is there a way to enforce a specific runtimeClassName for pods with the syncer? This woudl be great to enforce tolerations and a container runtime like kata for pods running on the super cluster.

If vPod specifies runtimeClassName to Kata, it should work. If you want to enforce/overwrite vPod runtimeClassTime to be fixed to Kata, you need to change the syncer code.

christopherhein commented 3 years ago

/retitle šŸ› Unable to create a VirtualCluster on k8 v1.20.2

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

christopherhein commented 3 years ago

/remove-lifecycle stale /lifecycle frozen

m-messiah commented 1 year ago

We have another issue creating VirtualCluster in 1.20, where we have apiserver v1.19:

{"level":"error","ts":1664969234.2211943,"logger":"controller-runtime.manager.controller.virtualcluster","msg":"Reconciler error","reconciler group":"tenancy.x-k8s.io","reconciler kind":"VirtualCluster","name":"test","namespace":"default","error":"VirtualCluster.tenancy.x-k8s.io \"test\" is invalid: [status.reason: Invalid value: \"null\": status.reason in body must be of type string: \"null\", status.message: Invalid value: \"null\": status.message in body must be of type string: \"null\", status.phase: Invalid value: \"null\": status.phase in body must be of type string: \"null\"]"}

It is fixed in https://github.com/kubernetes/kubernetes/pull/95423 and I will test conversion of fields to pointers shortly, to be compatible with 1.19 too https://github.com/fluid-cloudnative/fluid/issues/1551#issuecomment-1072996131