kubernetes-sigs / cluster-api-provider-nested

Cluster API Provider for Nested Clusters
Apache License 2.0
299 stars 65 forks source link

The syncer report different pod.status regular #154

Closed vincent-pli closed 3 years ago

vincent-pli commented 3 years ago

What steps did you take and what happened: Created deploy in tenant cluster and pod works as expected, but get message regular(5min) in syncer's log:

1 checker.go:261] status of pod default-a32f71-vc-sample-1-default/test-deploy-1-696d8b7c77-clz6m diff in super&tenant master

Do some research, seems:

  1. The nodelifecycle controller of kube-manager-controller in tenant cluster can not contact the virtual node, then set the status of node as NotReady, then add taint to the node and eviction manager try to eviction pod in that virtual node, then set the condition of the node as status: false

  2. Then the pod syncer diff the status and find the difference, then raise the message and sync the status of pod.

  3. After a while the node syncer find the node difference and sync the status of node as Ready

and the process occurred regular.

So I try to disable the nodelifecycle in kube-controller-manager of tenant cluster then everything works as expected.

So could we disable the nodelifecycle?

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-nested/labels?q=area for the list of labels]

vincent-pli commented 3 years ago

@Fei-Guo @christopherhein @charleszheng44

Fei-Guo commented 3 years ago

This is known issue and mentioned in the project README.

The syncer controller manages the lifecycle of the node objects in tenant control plane but it does not update the node lease objects in order to reduce network traffic. As a result, it is recommended to increase the tenant control plane node controller --node-monitor-grace-period parameter to a larger value ( >60 seconds, done in the sample clusterversion yaml already).

We have played the trick in the clusterversion yaml but not the CAPI implementation.

@vincent-pli Are you using CAPN provisioner? Can you double check the setting of node-monitor-grace-period?

In https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/main/controlplane/nested/component-templates/nested-controllermanager/nested-controllermanager-statefulset-template.yaml, it is set to 200s.

vincent-pli commented 3 years ago

@Fei-Guo Actually I follow the demo document

But my node-monitor-grace-period is 200s:

    - --bind-address=0.0.0.0
    - --cluster-cidr=10.200.0.0/16
    - --cluster-signing-cert-file=/etc/kubernetes/pki/root/tls.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/root/tls.key
    - --kubeconfig=/etc/kubernetes/kubeconfig/controller-manager-kubeconfig
    - --authorization-kubeconfig=/etc/kubernetes/kubeconfig/controller-manager-kubeconfig
    - --authentication-kubeconfig=/etc/kubernetes/kubeconfig/controller-manager-kubeconfig
    - --leader-elect=false
    - --root-ca-file=/etc/kubernetes/pki/root/tls.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/service-account/tls.key
    - --service-cluster-ip-range=10.32.0.0/24
    - --use-service-account-credentials=true
    - --experimental-cluster-signing-duration=87600h
    - --node-monitor-grace-period=200s
    - --v=2

BTW, I'm not dig too much of the code, why we still need nodelifecycle in tenant cluster? the node is not a real one.

Fei-Guo commented 3 years ago

Can you double check if the vNode in the tenant controller get heart-beat updated every minute? If not, the syncer has problem. If the heart beat is updated, maybe 1.20 changed the node lifecycle controller behavior. My testbed is still 1.18 and I will take a look at 1.20.

We don't need nodelifecycle controller, we just don't have a simple way to disable it. One hacky way is to delete the node-controller serviceaccount from kube-system namespace, which is not recommended.

vincent-pli commented 3 years ago

I think the syncer works, since the NotReady will be correct to Ready after a while, I think it's because syncer do its job. To disable the nodelifecycle, a little confuse, why not use the config like this: --controllers=*,-nodelifecycle

and I'm not sure how to check the heart-beat, in the log of kube-controller-manager in the tenant cluster? @Fei-Guo

Fei-Guo commented 3 years ago

Good call. We can try --controllers=*,-nodelifecycle

vNode status should have the heart beat information.

conditions:
    - lastHeartbeatTime: "2021-07-03T05:28:46Z"
      lastTransitionTime: "2020-12-16T23:03:59Z"
      message: kubelet is posting ready status
      reason: KubeletReady
      status: "True"
      type: Ready
vincent-pli commented 3 years ago

Thanks @Fei-Guo The heart-beat is good:

  - lastHeartbeatTime: "2021-07-03T06:53:43Z"
    lastTransitionTime: "2021-07-01T00:17:55Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready