Closed dghubble closed 2 years ago
Well that's certainly bizarre. Not sure what has changed in that area that might cause this issue, especially given the RBAC seems to have the correct permissions present! Will see what I can find..
@dghubble could you confirm that the calico-kubeconfig
file is actually the one referenced in Calico CNI configuration json? Just to make sure the plugin is actually using the file.
One thing that might be relevant here is that Kubernetes recently enabled service account projection by default: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume
You may need manifest changes so that Calico can maintain the credentials on disk as they are rotated. I see that you have the correct volume mount here: https://github.com/poseidon/terraform-render-bootstrap/blob/master/resources/calico/daemonset.yaml#L174-L177
You may also need to set CALICO_MANAGE_CNI=true in the calico/node env to enable the right logic, though.
When 10-calico.conflist
is written out, __KUBECONFIG_FILEPATH__
is replaced with /etc/cni/net.d/calico-kubeconfig
. Within the calico-node
container, the location of the mounted file is /host/cni/net.d/calico-kubeconfig
. Maybe that's not what calico-node
wants, but it's been this way a while.
That seems to match what Calico's release calico.yaml
shows here and Calico reports no troubles finding a kubeconfig either.
Setting calico-node
's env
didn't seem to alter the result.
- name: CALICO_MANAGE_CNI
value: "true"
we're facing the same issue here . is there any suggested solutions ?
Our Environment: Calico version: v3.19.3 Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes v1.21.5 Operating System and version: ubuntu 20.04 provisioning tool: kops v1.21.1
getting ClusterInformation: connection is unauthorized: Unauthorized"
The error is certainly unusual. I haven't been able to reproduce this at all in my own rigs. Generally, and RBAC issue would show something more precise - e.g., "serviceaccount X is not allowed to Y resouce Z".
Issues with bad TLS credentials would also show up more clearly. I'm not really sure the root cause of this, and I think to figure it out we probably need to dig into the API server logs to see why it is rejecting the request as unauthorized.
@dghubble @ilyesAj it's been a long time on this one... did you ever make any headway on it? I haven't seen this anywhere else.
To pass Kubernetes v1.22 conformance testing, Typhoon used flannel instead of Calico. I'll re-run with Calico during the v1.23 cycle.
@caseydavenport we have rollbacked to k8s 1.20.11 , it's seems to be releated to the k8s version , we didn't have any problems since .
I had a go at reproing this.
sonobuoy run --e2e-focus="evicts.pods.with.minTolerationSeconds" --e2e-skip=""
That hit the same issue, I think:
Mar 18 10:28:43.496: FAIL: Failed to evict all Pods. 2 pod(s) is not evicted.
Full Stack Trace
k8s.io/kubernetes/test/e2e.RunE2ETests(0x23f7fb7)
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:133 +0x697
k8s.io/kubernetes/test/e2e.TestE2E(0x2371919)
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e_test.go:136 +0x19
testing.tRunner(0xc000a1c680, 0x71553e0)
/usr/local/go/src/testing/testing.go:1259 +0x102
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:1306 +0x35a
STEP: verifying the node doesn't have the taint kubernetes.io/e2e-evict-taint-key=evictTaintVal:NoExecute
[AfterEach] [sig-node] NoExecuteTaintManager Multiple Pods [Serial]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:186
STEP: Collecting events from namespace "taint-multiple-pods-5037".
STEP: Found 17 events.
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:50 +0000 UTC - event for taint-eviction-b1: {default-scheduler } Scheduled: Successfully assigned taint-multiple-pods-5037/taint-eviction-b1 to ip-10-0-58-124
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:50 +0000 UTC - event for taint-eviction-b2: {default-scheduler } Scheduled: Successfully assigned taint-multiple-pods-5037/taint-eviction-b2 to ip-10-0-58-124
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:51 +0000 UTC - event for taint-eviction-b1: {kubelet ip-10-0-58-124} Pulled: Container image "k8s.gcr.io/pause:3.6" already present on machine
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:51 +0000 UTC - event for taint-eviction-b1: {kubelet ip-10-0-58-124} Created: Created container pause
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:51 +0000 UTC - event for taint-eviction-b1: {kubelet ip-10-0-58-124} Started: Started container pause
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:51 +0000 UTC - event for taint-eviction-b2: {kubelet ip-10-0-58-124} Started: Started container pause
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:51 +0000 UTC - event for taint-eviction-b2: {kubelet ip-10-0-58-124} Pulled: Container image "k8s.gcr.io/pause:3.6" already present on machine
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:51 +0000 UTC - event for taint-eviction-b2: {kubelet ip-10-0-58-124} Created: Created container pause
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:58 +0000 UTC - event for taint-eviction-b1: {taint-controller } TaintManagerEviction: Marking for deletion Pod taint-multiple-pods-5037/taint-eviction-b1
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:26:58 +0000 UTC - event for taint-eviction-b1: {kubelet ip-10-0-58-124} Killing: Stopping container pause
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:27:03 +0000 UTC - event for taint-eviction-b1: {kubelet ip-10-0-58-124} FailedKillPod: error killing pod: failed to "KillPodSandbox" for "a7fa4a6f-f477-43f8-af43-821734900ee8" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"9676827c763d10f18114d0666c597de32f3a9c1d1efd1741cf61a901e3a74f2b\": connection is unauthorized: Unauthorized"
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:27:04 +0000 UTC - event for taint-eviction-b1: {kubelet ip-10-0-58-124} FailedKillPod: error killing pod: failed to "KillPodSandbox" for "a7fa4a6f-f477-43f8-af43-821734900ee8" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"9676827c763d10f18114d0666c597de32f3a9c1d1efd1741cf61a901e3a74f2b\": error getting ClusterInformation: connection is unauthorized: Unauthorized"
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:27:18 +0000 UTC - event for taint-eviction-b2: {kubelet ip-10-0-58-124} Killing: Stopping container pause
Mar 18 10:28:45.166: INFO: At 2022-03-18 10:27:18 +0000 UTC - event for taint-eviction-b2: {taint-controller } TaintManagerEviction: Marking for deletion Pod taint-multiple-pods-5037/taint-eviction-b2
Mar 18 10:28:45.167: INFO: At 2022-03-18 10:27:19 +0000 UTC - event for taint-eviction-b2: {kubelet ip-10-0-58-124} FailedKillPod: error killing pod: failed to "KillPodSandbox" for "e9a4ef09-2940-43b0-bf18-368d4d9bd77e" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"06daa4725d4984da00dbc56b4ebc9695897545b15fabf31569267b5fd22d5a8d\": error getting ClusterInformation: connection is unauthorized: Unauthorized"
Mar 18 10:28:45.167: INFO: At 2022-03-18 10:28:41 +0000 UTC - event for taint-eviction-b1: {taint-controller } TaintManagerEviction: Marking for deletion Pod taint-multiple-pods-5037/taint-eviction-b1
Mar 18 10:28:45.167: INFO: At 2022-03-18 10:28:44 +0000 UTC - event for taint-eviction-b2: {taint-controller } TaintManagerEviction: Cancelling deletion of Pod taint-multiple-pods-5037/taint-eviction-b2
Mar 18 10:28:45.340: INFO: POD NODE PHASE GRACE CONDITIONS
Mar 18 10:28:45.341: INFO: taint-eviction-b1 ip-10-0-58-124 Running 30s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:26:50 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:27:04 +0000 UTC ContainersNotReady containers with unready status: [pause]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:27:04 +0000 UTC ContainersNotReady containers with unready status: [pause]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:26:50 +0000 UTC }]
Mar 18 10:28:45.342: INFO: taint-eviction-b2 ip-10-0-58-124 Running 30s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:26:50 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:27:20 +0000 UTC ContainersNotReady containers with unready status: [pause]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:27:20 +0000 UTC ContainersNotReady containers with unready status: [pause]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-03-18 10:26:50 +0000 UTC }]
Looking at calico-node logs from that node, I see that the calico-node pod has a start time that is after the end of the test. I wonder if the problem is that calico-node doesn't tolerate the taint that's being added and is killed?
If I manually add the same taint that the test uses (kubernetes.io/e2e-evict-taint-key=evictTaintVal:NoExecute
), I do indeed see the calico-node pod on that node disappear.
If I add a "tolerate everything" to the calico-node daemonset:
tolerations:
- operator: "Exists"
and re-run the e2e test, I see the test pass:
------------------------------
[sig-node] NoExecuteTaintManager Multiple Pods [Serial]
evicts pods with minTolerationSeconds [Disruptive] [Conformance]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:630
[BeforeEach] [sig-node] NoExecuteTaintManager Multiple Pods [Serial]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:185
STEP: Creating a kubernetes client
Mar 18 11:33:14.984: INFO: >>> kubeConfig: /tmp/kubeconfig-845565767
STEP: Building a namespace api object, basename taint-multiple-pods
W0318 11:33:15.041870 19 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Mar 18 11:33:15.042: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-node] NoExecuteTaintManager Multiple Pods [Serial]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/node/taints.go:345
Mar 18 11:33:15.064: INFO: Waiting up to 1m0s for all nodes to be ready
Mar 18 11:34:15.121: INFO: Waiting for terminating namespaces to be deleted...
[It] evicts pods with minTolerationSeconds [Disruptive] [Conformance]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:630
Mar 18 11:34:15.148: INFO: Starting informer...
STEP: Starting pods...
Mar 18 11:34:15.454: INFO: Pod1 is running on ip-10-0-58-124. Tainting Node
Mar 18 11:34:19.703: INFO: Pod2 is running on ip-10-0-58-124. Tainting Node
STEP: Trying to apply a taint on the Node
STEP: verifying the node has the taint kubernetes.io/e2e-evict-taint-key=evictTaintVal:NoExecute
STEP: Waiting for Pod1 and Pod2 to be deleted
Mar 18 11:34:26.063: INFO: Noticed Pod "taint-eviction-b1" gets evicted.
Mar 18 11:34:46.172: INFO: Noticed Pod "taint-eviction-b2" gets evicted.
STEP: verifying the node doesn't have the taint kubernetes.io/e2e-evict-taint-key=evictTaintVal:NoExecute
[AfterEach] [sig-node] NoExecuteTaintManager Multiple Pods [Serial]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:186
Mar 18 11:34:46.206: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "taint-multiple-pods-7930" for this suite.
• [SLOW TEST:91.247 seconds]
[sig-node] NoExecuteTaintManager Multiple Pods [Serial]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/node/framework.go:23
evicts pods with minTolerationSeconds [Disruptive] [Conformance]
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:630
------------------------------
{"msg":"PASSED [sig-node] NoExecuteTaintManager Multiple Pods [Serial] evicts pods with minTolerationSeconds [Disruptive] [Conformance]","total":1,"completed":1,"skipped":31,"failed":0}
@dghubble So I think the problem lies with the calico manifest - not tolerating the taint that that test uses.
Where does Typhoon get the calico manifest from? Does it vendor the manifest or get it from the calico docs?
Oh - unless there's a way to ensure that the CNIs kubeconfig doesn't get deleted when calico-node gets killed? @caseydavenport ?
Just realised that my sonobuoy run doesn't use the same settings as the OP provided.
sonobuoy run --e2e-focus="NoExecuteTaintManager Multiple Pods" --e2e-skip="" --plugin-env=e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/controller"
Running that with the "tolerate everything" setting - both tests pass.
that the CNIs kubeconfig doesn't get deleted when calico-node gets killed?
Hm, I didn't think that deleting the calico/node pod would remove the CNI kubeconfig. At least I don't think anything in Calico does that.
It might be that the token is invalidated and since calico/node isn't running it can't update the config on the host?
It might be that the token is invalidated and since calico/node isn't running it can't update the config on the host?
How often does the token get cycled? Because we hit this every time when I run the test (without the toleration). For that behaviour, the token cycling time would have to be ~1 min
From https://kubernetes.slack.com/archives/C0EN96KUY/p1647856984220609, it appears that kubernetes actively revokes service account credentials when pods are deleted.
Ah interesting. I found the vendored calico manifests in https://github.com/poseidon/terraform-render-bootstrap/blob/master/resources/calico/daemonset.yaml
They have:
tolerations:
- key: node-role.kubernetes.io/controller
operator: Exists
- key: node.kubernetes.io/not-ready
operator: Exists
%{~ for key in daemonset_tolerations ~}
- key: ${key}
operator: Exists
%{~ endfor ~}
Whereas https://docs.projectcalico.org/manifests/calico.yaml has:
tolerations:
# Make sure calico-node gets scheduled on all nodes.
- effect: NoSchedule
operator: Exists
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
@dghubble Perhaps the Typhoon manifest needs to add
- effect: NoSchedule
operator: Exists
Thanks for looking into this folks!
I know CNI's docs often show an "allow everywhere" toleration (i.e. operator: Exists
without a key). However, we can't ship that. Clusters support many platforms (clouds, bare-metal) and support heterogeneous nodes with different properties (e.g. worker pools with different OSes, Arch, resources, hardware, etc).
Choosing on behalf of users that a Calico DaemonSet should be on ALL nodes would limit use cases. For example,
tolerations:
# Make sure calico-node gets scheduled on all nodes. <- Good for simple clusters (90% use case)
- effect: NoSchedule
operator: Exists
# Mark the pod as a critical add-on for rescheduling. <- Deprecated
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute <- Good for simple clusters
operator: Exists
Instead, Typhoon allows kube-system
DaemonSet tolerations to be configured, to support those more advanced cases. Here's one example (though Typhoon doesn't support ARM64 if Calico is chosen).
tolerations:
- key: node-role.kubernetes.io/controller
operator: Exists
- key: node.kubernetes.io/not-ready
operator: Exists
%{~ for key in daemonset_tolerations ~}
- key: ${key}
operator: Exists
%{~ endfor ~}
From your investigation, it sounds like having this conformance test pass will require listing what those expected taints are, and provisioning the cluster so that Calico tolerates them. I suppose the reason Cilium and flannel don't hit this is because they're not relying on credentials in the same way.
Clusters with x86 and arm64 nodes, Calico does not ship a typical multi-arch container image (it ships an image per architecture, which is different and requires DaemonSets matching a subset of nodes)
I don't think that's true any more? @caseydavenport could probably confirm.
Calico does not ship a typical multi-arch container image (it ships an image per architecture, which is different and requires DaemonSets matching a subset of nodes)
Yep, this one at least is no longer the case (manifests are multi-arch now).
I suppose the reason Cilium and flannel don't hit this is because they're not relying on credentials in the same way.
Yeah, this would only be hit if the CNI plugin on the host needs to make API calls and is doing so using the serviceaccount token of the daemonset pod that installed it.
One option here might be to use the TokenRequest API directly to provision a separate token not bound to the life of the calico/node pod.
In general I agree that we can't expect "Tolerate all" to be acceptable for every single cluster in existence, but I do think it is the correct default for what we ship because it will be right for the vast majority of clusters.
Certain cases where you don't want a CNI provider on a set of nodes at all
I believe for use cases like this we have switched to using node affinities rather than taints/tolerations. For example, this node affinity prevents us from running on fargate nodes: https://github.com/tigera/operator/blob/master/pkg/render/node.go#L711
Awesome to see the multi-arch manifest images. I'll check those out, that'll help remove one case.
I agree, for the vast majority of clusters your example seems great. I wouldn't advocate changing it either.
I may look at node affinities, but having those be conditional is a lot more tricky just with the Terraform logic available to us. And taints do seem to express the situation fairly clearly. I'm not sure affinities would have a different end result (e.g. Kubernetes E2E might have an equivalent test there that gets thrown off in the same way).
Thanks to your investigation @lwr20, this looks like a detail of how CNCF conformance tests run, you'd agree? That wouldn't affect real clusters as far as I can tell. We could just say, if we need to pass conformance, you need to tolerate kubernetes.io/e2e-evict-taint-key
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/node/taints.go#L48 in our docs.
module "typhoon-cluster" {
...
networking = "calico"
daemonset_tolerations = ["kubernetes.io/e2e-evict-taint-key"]
}
this looks like a detail of how CNCF conformance tests run, you'd agree?
I agree. I certainly don't think that particular conformance test is intended to mandate that "CNIs must work without talking to the kubernetes apiserver" for example.
One option here might be to use the TokenRequest API directly to provision a separate token not bound to the life of the calico/node pod.
Of course if we could do this, that would be ideal. But will need some prototyping and testing of course.
Adding the DaemonSet toleration for kubernetes.io/e2e-evict-taint-key
gets this conformance test passing for me as well. I can update conformance testing nodes and go ahead and close this issue if that's alright.
One option here might be to use the TokenRequest API directly to provision a separate token not bound to the life of the calico/node pod.
It would be nice to not hit this if that's a reasonable thing to do. Presumably that would be a separate issue if its desired.
@caseydavenport We have run into this issue in one of our clusters in a slightly different scenario. For bin packing reasons, we scale the resource requests of calico-node
vertically in a cluster-proportional manner (pretty similar to https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/calico-policy-controller).
As the cluster grew in size, calico-node
was supposed to be recreated with bigger memory requests, i.e. the existing pod was deleted and a new one with higher memory requests being created. However, as the node was nearly fully loaded there was not enough space for the new pod. Thanks to the priority class of calico-node
pre-emption occurred and the kube-scheduler
tried to get rid of a lower priority pod on the node.
However, now we ran into the problem that the lower priority pod could not be deleted as network sandbox deletion via CNI fails with this error (error getting ClusterInformation: connection is unauthorized: Unauthorized"
) as the token in calico's kubeconfig belongs to a deleted pod.
The node cannot automatically recover from this as no pod can be completely removed due to the CNI error and calico-node
cannot be scheduled as the memory requirements are not fulfilled.
Is there a plan to resolve this issue by for example using the token api directly or otherwise decoupling the validity of the token used for CNI from the calico-node
pod lifecycle?
As we would like to have this issue fixed properly, we would like to contribute a solution to this. It seems like the logic is scattered across two place:
Instead of using the token of the calico-node
pod, we would propose create a separate token via https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-request-v1/ either bound to no object or bound to the node object. The validity period can be rather small, e.g. 1h. The token would then be replaced with a simple timer based approach.
@caseydavenport Would you be open to such a contribution?
@caseydavenport We have run into this issue in one of our clusters in a slightly different scenario. For bin packing reasons, we scale the resource requests of
calico-node
vertically in a cluster-proportional manner (pretty similar to https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/calico-policy-controller). As the cluster grew in size,calico-node
was supposed to be recreated with bigger memory requests, i.e. the existing pod was deleted and a new one with higher memory requests being created. However, as the node was nearly fully loaded there was not enough space for the new pod. Thanks to the priority class ofcalico-node
pre-emption occurred and thekube-scheduler
tried to get rid of a lower priority pod on the node. However, now we ran into the problem that the lower priority pod could not be deleted as network sandbox deletion via CNI fails with this error (error getting ClusterInformation: connection is unauthorized: Unauthorized"
) as the token in calico's kubeconfig belongs to a deleted pod. The node cannot automatically recover from this as no pod can be completely removed due to the CNI error andcalico-node
cannot be scheduled as the memory requirements are not fulfilled.Is there a plan to resolve this issue by for example using the token api directly or otherwise decoupling the validity of the token used for CNI from the
calico-node
pod lifecycle?
Hi, In our case on a slightly different setting. Even the new nodes with compartively less/no pods have this issue. Even when I had actually formatted and added the machine back We still figure get this CNI issue.
You can find more about this issue here
Instead of using the token of the calico-node pod, we would propose create a separate token via https://kubernetes.io/docs/reference/kubernetes-api/authentication-resources/token-request-v1/ either bound to no object or bound to the node object. The validity period can be rather small, e.g. 1h. The token would then be replaced with a simple timer based approach.
Yes, this is the approach that I was musing on as well. I think it is worth exploring this to see what it would look like and what limitations it might have (hopefully none)
Yes, this is the approach that I was musing on as well. I think it is worth exploring this to see what it would look like and what limitations it might have (hopefully none)
@caseydavenport Should I create a corresponding pull request or do you plan to explore it yourself?
@ScheererJ I'd be happy to review a PR for this if you have one. Otherwise I will take a look at it myself once v3.23 is out the door (so in a couple of weeks).
@caseydavenport Feel free to take a look at #5910 if you have some time to spare. I will be on vacation next week, though. Hence, there is no hurry from my side.
Expected Behavior
Calico CNI plugin tears down Pod in a timely manner.
Current Behavior
Calico CNI plugin shows errors terminating Pods, and therefore eviction takes too long. Especially relevant in Kubernetes conformance testing.
The natural things to check are RBAC permissions, which match recommendations:
To be certain, we can use the actual kubeconfig Calico writes to the host's
/etc/cni/net.d
. It does indeed seem to have permission to get clusterinformations. The error above is unusual.Steps to Reproduce (for bugs)
Context
This issue affects Kubernetes Conformance tests:
The test in question creates two Pods that don't tolerate a taint, and expects them to be terminated within certain times. In Kubelet logs, the Calico CNI plugin is complaining with the logs above and termination takes too long.
Your Environment