Closed sysnet4admin closed 1 year ago
@sysnet4admin could you please fill in the template with more information about your issue?
@coutinhop Oh..? I am so sorry, it does't mean to upload without any comment. (My cat push the button? something..? anyhow OMG) Thus I updated all I know so far. The trigger or reproducing procedure is not clear yet. Therefore I will clarify for duplicated protocol as soon.
Thank you for letting me know empty issue that I upload.
[ pod for all namespaces ]
[root@m-k8s ~]# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default new-nginx-d8b84d87b-jpzr9 1/1 Running 0 21h
default new-nginx-d8b84d87b-r245z 1/1 Running 0 21h
default new-nginx-d8b84d87b-xjc8k 1/1 Running 0 21h
default nfs-client-provisioner-7596fb9c9c-jvmnm 1/1 Running 1 (28h ago) 2d22h
default synthetic-load-generator-554f846686-fxgms 1/1 Running 0 3h45m
example-hotrod example-hotrod-6c5d878866-bbt7l 1/1 Running 0 4h47m
ingress-nginx ingress-nginx-admission-create-bqvnp 0/1 Completed 0 5h3m
ingress-nginx ingress-nginx-admission-patch-sdjbr 0/1 Completed 1 5h3m
ingress-nginx ingress-nginx-controller-64f79ddbcc-7wltw 1/1 Running 0 5h1m
kube-system calico-kube-controllers-57b57c56f-96j5s 1/1 Running 0 3d3h
kube-system calico-node-79rvm 1/1 Running 0 25h
kube-system calico-node-bc54v 1/1 Running 0 25h
kube-system calico-node-xx5c4 1/1 Running 0 25h
kube-system calico-node-zlk6h 1/1 Running 0 25h
kube-system coredns-787d4945fb-n5z6g 1/1 Running 0 3d3h
kube-system coredns-787d4945fb-q6zj8 1/1 Running 0 3d3h
kube-system etcd-m-k8s 1/1 Running 0 3d3h
kube-system kube-apiserver-m-k8s 1/1 Running 0 3d3h
kube-system kube-controller-manager-m-k8s 1/1 Running 0 3d3h
kube-system kube-proxy-6wrc9 1/1 Running 0 3d3h
kube-system kube-proxy-drtcr 1/1 Running 1 (27h ago) 3d1h
kube-system kube-proxy-hmp89 1/1 Running 0 3d3h
kube-system kube-proxy-hnxrh 1/1 Running 0 3d3h
kube-system kube-scheduler-m-k8s 1/1 Running 0 3d3h
kube-system metrics-server-7948965fbb-56tct 1/1 Running 0 27h
metallb-system controller-577b5bdfcc-tj6nq 1/1 Running 0 27h
metallb-system speaker-8szsl 1/1 Running 0 3d3h
metallb-system speaker-j4hsp 1/1 Running 0 3d3h
metallb-system speaker-pm9jj 1/1 Running 0 3d3h
metallb-system speaker-rg9wk 1/1 Running 2 (27h ago) 3d1h
monitoring grafana-5d9c96fc4c-x4sm8 0/1 Terminating 0 3d1h
monitoring jaeger-5dc997d86c-trhnb 1/1 Running 0 4h25m
monitoring prometheus-kube-state-metrics-5f69cf9d49-tr24p 0/1 Terminating 0 3d1h
monitoring tempo-0 2/2 Running 0 3h45m
[ Describe for terminating pod ]
[root@m-k8s ~]# k describe po -n monitoring prometheus-kube-state-metrics-5f69cf9d49-tr24p
Name: prometheus-kube-state-metrics-5f69cf9d49-tr24p
Namespace: monitoring
Priority: 0
Service Account: prometheus-kube-state-metrics
Node: w2-k8s/192.168.1.102
Start Time: Sat, 21 Jan 2023 13:35:29 +0900
Labels: app.kubernetes.io/component=metrics
app.kubernetes.io/instance=prometheus
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kube-state-metrics
app.kubernetes.io/part-of=kube-state-metrics
app.kubernetes.io/version=2.4.1
helm.sh/chart=kube-state-metrics-4.7.0
pod-template-hash=5f69cf9d49
Annotations: cni.projectcalico.org/containerID: 00a3ada685842372182116b86e1dd5de8ceb0eca50ba212dd8e4c046d95fa193
cni.projectcalico.org/podIP: 172.16.103.129/32
cni.projectcalico.org/podIPs: 172.16.103.129/32
Status: Terminating (lasts 10m)
Termination Grace Period: 30s
IP: 172.16.103.129
IPs:
IP: 172.16.103.129
Controlled By: ReplicaSet/prometheus-kube-state-metrics-5f69cf9d49
Containers:
kube-state-metrics:
Container ID: containerd://73384c61d39eadaa7a20794631b1b7c31ff268b46cb12e360917939000a781c4
Image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.4.1
Image ID: k8s.gcr.io/kube-state-metrics/kube-state-metrics@sha256:69a18fa1e0d0c9f972a64e69ca13b65451b8c5e79ae8dccf3a77968be4a301df
Port: 8080/TCP
Host Port: 0/TCP
Args:
--port=8080
--resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
--telemetry-port=8081
State: Terminated
Reason: Error
Exit Code: 2
Started: Sat, 21 Jan 2023 13:35:39 +0900
Finished: Tue, 24 Jan 2023 14:42:56 +0900
Ready: False
Restart Count: 0
Liveness: http-get http://:8080/healthz delay=5s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/ delay=5s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5d4nv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-5d4nv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 11m kubelet Stopping container kube-state-metrics
Warning FailedKillPod 85s (x50 over 11m) kubelet error killing pod: failed to "KillPodSandbox" for "8f47d16a-70e9-49cc-be1c-4bf5911cdc57" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"00a3ada685842372182116b86e1dd5de8ceb0eca50ba212dd8e4c046d95fa193\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"
[calico-#7220-cluster-info.dump] calico-7220-cluster-info.dump.zip
Plus workaround is effective.
[root@m-k8s ~]# kubectl rollout restart ds -n kube-system calico-node
daemonset.apps/calico-node restarted
[root@m-k8s ~]# k get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default new-nginx-d8b84d87b-jpzr9 1/1 Running 0 21h
default new-nginx-d8b84d87b-r245z 1/1 Running 0 21h
default new-nginx-d8b84d87b-xjc8k 1/1 Running 0 21h
default nfs-client-provisioner-7596fb9c9c-jvmnm 1/1 Running 1 (29h ago) 2d22h
default synthetic-load-generator-554f846686-fxgms 1/1 Running 0 4h
example-hotrod example-hotrod-6c5d878866-bbt7l 1/1 Running 0 5h2m
ingress-nginx ingress-nginx-admission-create-bqvnp 0/1 Completed 0 5h18m
ingress-nginx ingress-nginx-admission-patch-sdjbr 0/1 Completed 1 5h18m
ingress-nginx ingress-nginx-controller-64f79ddbcc-7wltw 1/1 Running 0 5h16m
kube-system calico-kube-controllers-57b57c56f-96j5s 1/1 Running 0 3d3h
kube-system calico-node-fpmtb 1/1 Running 0 77s
kube-system calico-node-gmksz 1/1 Running 0 66s
kube-system calico-node-hzk7k 1/1 Running 0 45s
kube-system calico-node-zqd24 1/1 Running 0 56s
kube-system coredns-787d4945fb-n5z6g 1/1 Running 0 3d3h
kube-system coredns-787d4945fb-q6zj8 1/1 Running 0 3d3h
kube-system etcd-m-k8s 1/1 Running 0 3d3h
kube-system kube-apiserver-m-k8s 1/1 Running 0 3d3h
kube-system kube-controller-manager-m-k8s 1/1 Running 0 3d3h
kube-system kube-proxy-6wrc9 1/1 Running 0 3d3h
kube-system kube-proxy-drtcr 1/1 Running 1 (27h ago) 3d2h
kube-system kube-proxy-hmp89 1/1 Running 0 3d3h
kube-system kube-proxy-hnxrh 1/1 Running 0 3d3h
kube-system kube-scheduler-m-k8s 1/1 Running 0 3d3h
kube-system metrics-server-7948965fbb-56tct 1/1 Running 0 28h
metallb-system controller-577b5bdfcc-tj6nq 1/1 Running 0 28h
metallb-system speaker-8szsl 1/1 Running 0 3d3h
metallb-system speaker-j4hsp 1/1 Running 0 3d3h
metallb-system speaker-pm9jj 1/1 Running 0 3d3h
metallb-system speaker-rg9wk 1/1 Running 2 (27h ago) 3d2h
monitoring jaeger-5dc997d86c-trhnb 1/1 Running 0 4h40m
monitoring tempo-0 2/2 Running 0 4h
same behaviour
8m32s Warning FailedCreatePodSandBox pod/hello-27927411-gk5nf (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c9cf89858e821ef4eb9502deb09725cf8e88be7675d9861fa1a2d25cc03a596f": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
cluster info
k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8sc1 Ready control-plane,etcd,master 88d v1.24.6+rke2r1 192.168.88.87 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic containerd://1.6.8-k3s1
k8sc2 Ready <none> 88d v1.24.6+rke2r1 192.168.88.88 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic containerd://1.6.8-k3s1
k8sc3 Ready <none> 88d v1.24.6+rke2r1 192.168.88.89 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic containerd://1.6.8-k3s1
k8sc4 Ready <none> 82d v1.24.6+rke2r1 192.168.88.90 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic containerd://1.6.8-k3s1
k8sc5 Ready <none> 71d v1.24.6+rke2r1 192.168.88.91 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic containerd://1.6.8-k3s1
k8sc6 Ready <none> 88d v1.24.6+rke2r1 192.168.88.92 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic containerd://1.6.8-k3s1
k8sc7 Ready <none> 88d v1.24.6+rke2r1 192.168.88.93 <none> Ubuntu 22.04.1 LTS 5.15.0-58-generic containerd://1.6.8-k3s1
$
FYI
k8s v1.26.1 + calico_v3.24.5 = duplicated
k8s v1.26.1 + calico_v3.25.0 = duplicated
k8s v1.25.6 + calico_v3.24.5 = duplicated
k8s v1.25.6 + calico_v3.25.0 = duplicated
k8s v1.26.1 + calico_v3.17.1 = NOT duplicated (i.e. solved issue with this version)
service account token has been invalidated
Could there be something else in your cluster invalidating the tokens somehow?
Facing same issue.
We refrain using workaround so any updates on how to get rid of this issue? How can we tackle service account policy changes in kubernetes v1.26 mentioned in this issue description?
I'm using - k8s v1.26.1 + calico_v3.25.0 + containerd 1.6.18
I have the same problem, if u think that's clear in master, maybe the problem can solving at another node or worker.
@caseydavenport My lab is not running anymore due to limited resource issue. So I will setup and recheck within 1-2 weeks and let you know if there is some any invalidation tokens or evidence for it.
I am seeing the same issue. After storage was extended on the device. I am on Kubspray, k8 v1.24.6, calico v3.25.0, containerd v1.7.0. I re-executed the ansible playbook, it did not help even after restart of NetworkManager, containerd, kubelet.
Also facing this issue in canal as well. Causing a lot of headaches in my production cluster. any ideas on how to fix this?
EDIT: As suggested before a reboot kubectl rollout restart ds -n kube-system canal
seemed to of fixed this for me however when i rebooted I had rbac issues the calico cluster role didn't have:
- apiGroups: [""]
resources:
- serviceaccounts/token
verbs:
- create
I will see if after a few days if the issue persists
seemed to of fixed this for me however when i rebooted I had rbac issues the calico cluster role didn't have:
Was this missing from the manifest in our docs? Or just the manifest in your cluster? Make sure when upgrading that you are pulling the latest manifest from our release.
Sorry for not being more clear, I use rancher rke to bootstrap my cluster and it seems they didn't have the latest manifests.
Facing same issue using k8s: v1.27.0 & calico: v3.25.1.
Installed calico using calico manifest. kubectl rollout restart ds -n calico-system calico-node
temporarily fixes the issue. I also verified that the calico-node cluster role has create perms for serviceaccounts/token.
I found one ambiguity that can make this confusing. TokenWatcher ignores the KUBECONFIG
variable and assumes that the location of kubeconfig is /host/etc/cni/net.d/calico-kubeconfig
(hardcoded)
I rebuilt calico 3.25.1 with a very low token ttl to understand what the flow is and I think the way this works at the moment when KUBECONFIG
is set is quite confusing.
The way calico-kubeconfig
is handled in this case is the following:
install-cni starts up and builds a clientset with the in cluster service account token to lay down a calico-kubeconfig (with a token with a 24h ttl) - if a calico-kubeconfig already exists, it tries to use that, but if the jwt is expired, it uses in cluster config and replaces it (which means it doesn't fail) - if the existing token is valid for a long time, I think it won't rotate it (for example if you do a rollout restart in quick succession)
calico-node starts after install-cni finishes; if the KUBECONFIG
env variable is set, it will use that for kubernetes access
after less than 24h, calico-node attempts to rotate the token using the KUBECONFIG
clientset - it succeeds, but the problem is that the clientset in use by calico in token_watch will never reload it after it was rotated by itself - an easy fix would be to rebuild the clientset in the token watch loop here or change the clientset build here to ignore the kubeconfig variable
calico-node eventually fails with unauthorized errors - first in token renewal (calico will still work), and after calico-kubeconfig
expires, everything stops working - in my opinion this should be a hard failure, calico-node is broken at this point and restarting is the only "fix"
Also noticed a "hard" failure in install-cni if KUBECONFIG
is set and the file doesn't exist (such as when creating a new cluster) - install-cni will rotate it if it's an empty file or if the jwt is expired, but it will fail hard if the file doesn't exist - I think it should just create the file
I think there are some strange interactions between manually setting this variable and how #7106 would work in a future release.
Can someone with more context comment on what's the expected flow here?
Facing same issue using k8s: v1.27.0 & calico: v3.25.1.
Installed calico using calico manifest.
kubectl rollout restart ds -n calico-system calico-node
temporarily fixes the issue. I also verified that the calico-node cluster role has create perms for serviceaccounts/token.
This workaround didn't help me, but deleting the files from folder /etc/cni/net.d/* worked for me.
FYI k8s v1.27.2 + calico_v3.26.0 = Looking good after AGE 42H
[root@cp-k8s ~]# k get node
NAME STATUS ROLES AGE VERSION
cp-k8s Ready control-plane 42h v1.27.2
w1-k8s Ready <none> 42h v1.27.2
w2-k8s Ready <none> 42h v1.27.2
w3-k8s Ready <none> 42h v1.27.2
[root@cp-k8s ~]# k get po,svc
NAME READY STATUS RESTARTS AGE
pod/deploy-nginx-66df7dc8d9-8r545 1/1 Running 0 42h
pod/deploy-nginx-66df7dc8d9-bc9f6 1/1 Running 0 42h
pod/deploy-nginx-66df7dc8d9-cqfj6 1/1 Running 0 42h
pod/deploy-nginx-66df7dc8d9-fkf99 1/1 Running 0 42h
pod/deploy-nginx-66df7dc8d9-mrrl6 1/1 Running 0 42h
pod/deploy-nginx-66df7dc8d9-q6zgn 1/1 Running 0 42h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/deploy-nginx LoadBalancer 10.101.73.62 192.168.1.11 80:31560/TCP 42h
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 42h
Last check
k8s v1.27.2 + calico_v3.26.0 = Looking good after AGE 5D
[root@cp-k8s ~]# k get po,svc
NAME READY STATUS RESTARTS AGE
pod/deploy-nginx-66df7dc8d9-6cd7l 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-77fnk 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-95mck 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-9fkzn 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-hnbh2 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-kh66b 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-q989q 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-qtvkq 1/1 Running 0 5d17h
pod/deploy-nginx-66df7dc8d9-xnvd8 1/1 Running 0 5d17h
pod/nfs-client-provisioner-597dbc5f74-7hw67 1/1 Running 0 5d18h
[root@cp-k8s ~]# k get ds -n kube-system -o yaml | grep -i image:
image: docker.io/calico/node:v3.26.0
image: docker.io/calico/cni:v3.26.0
image: docker.io/calico/cni:v3.26.0
image: docker.io/calico/node:v3.26.0
image: registry.k8s.io/kube-proxy:v1.27.2
For all who are still struggling with this issue: take a look to the logs of your calico-node pod. I had the same problem and found out that the ServiceAccount "calico-node" was not permitted to create a "serviceaccounts/token" ressource because it was restricted to the ressource name "calico-cni-plugin". I removed the restriction to "calico-cni-plugin" and it works now.
I removed the restriction to "calico-cni-plugin" and it works now.
Would you care to explain this and the steps please?
"calico-node" was not permitted to create a "serviceaccounts/token" ressource because it was restricted to the ressource name "calico-cni-plugin". I removed the restriction to "calico-cni-plugin" and it works now.
As of Calico v3.26, the calico-node
serviceaccount should not have permission to create any serviceaccount tokens except for the calico-cni-plugin
token. This is done intentionally, so I'm curious if you could share the logs the clued you in to this change.
As of Calico v3.26, the
calico-node
serviceaccount should not have permission to create any serviceaccount tokens except for thecalico-cni-plugin
token. This is done intentionally, so I'm curious if you could share the logs the clued you in to this change.
A little bit background information: at my company we are using IBM Cloud and their Kubernetes cluster which we updated 13 days ago. Yesterday we noticed that CRUD operations on any pod are failing. Regarding to cluster roles only one - in fact calico-cni-plugin
was added. I don't know if the IBM cloud did create this cluster role automatically or one of my admins - our admin was not able to answer this question yet.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-cni-plugin
uid: bf8ebe3c-f491-4e63-bdb9-1801826917e5
resourceVersion: '109270438'
creationTimestamp: '2023-06-28T22:52:52Z'
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"calico-cni-plugin"},"rules":[{"apiGroups":[""],"resources":["pods","nodes","namespaces"],"verbs":["get"]},{"apiGroups":[""],"resources":["pods/status"],"verbs":["patch"]},{"apiGroups":["crd.projectcalico.org"],"resources":["blockaffinities","ipamblocks","ipamhandles","clusterinformations","ippools","ipreservations","ipamconfigs"],"verbs":["get","list","create","update","delete"]}]}
managedFields:
- manager: kubectl-client-side-apply
operation: Update
apiVersion: rbac.authorization.k8s.io/v1
time: '2023-06-28T22:52:52Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
- manager: dashboard
operation: Update
apiVersion: rbac.authorization.k8s.io/v1
time: '2023-07-12T17:12:23Z'
fieldsType: FieldsV1
fieldsV1:
f:rules: {}
rules:
- verbs:
- get
apiGroups:
- ''
resources:
- pods
- nodes
- namespaces
- verbs:
- patch
apiGroups:
- ''
resources:
- pods/status
- verbs:
- get
- list
- create
- update
- delete
apiGroups:
- crd.projectcalico.org
resources:
- blockaffinities
- ipamblocks
- ipamhandles
- clusterinformations
- ippools
- ipreservations
- ipamconfigs
Version:
Client Version: v3.26.1
Git commit: b1d192c95
Cluster Version: v3.25.1
Cluster Type: typha,kdd,k8s,bgp
Is the role calico-cni-plugin supposed to be allowed to create serviceaccount tokens?
What specific log do you want to take a look at?
Best regards and thank you for your help!
EDIT: unfortunetaly the logs of calico-node has been overwritten. But I can remember that it showed something like "service account 'calico-node:kube-system' has no permission to obtain a token".
EDIT2: just for a test I added again the resourceName 'calico-cni-plugin' to the service account token creation rule for the 'calico-node' cluster role and seems not work. The calico-node
pod's log:
2023-07-12T17:47:10.338Z | 2023-07-12 17:47:10.338 [ERROR][56] cni-config-monitor/token_watch.go 106: Unable to create token for CNI kubeconfig error=serviceaccounts "calico-node" is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system"
2023-07-12T17:47:10.338Z | 2023-07-12 17:47:10.338 [ERROR][56] cni-config-monitor/token_watch.go 130: Failed to update CNI token, retrying... error=serviceaccounts "calico-node" is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system"
2023-07-12T17:47:11.456Z | 2023-07-12 17:47:11.456 [INFO][30387] felix/summary.go 100: Summarising 26 dataplane reconciliation loops over 1m9s: avg=16ms longest=107ms ()
2023-07-12T17:47:16.986Z | 2023-07-12 17:47:16.986 [ERROR][56] cni-config-monitor/token_watch.go 106: Unable to create token for CNI kubeconfig error=serviceaccounts "calico-node" is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system"
2023-07-12T17:47:16.986Z | 2023-07-12 17:47:16.986 [ERROR][56] cni-config-monitor/token_watch.go 130: Failed to update CNI token, retrying... error=serviceaccounts "calico-node" is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system"
2023-07-12T17:47:22.143Z | 2023-07-12 17:47:22.142 [ERROR][56] cni-config-monitor/token_watch.go 106: Unable to create token for CNI kubeconfig error=serviceaccounts "calico-node" is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system"
2023-07-12T17:47:22.143Z | 2023-07-12 17:47:22.142 [ERROR][56] cni-config-monitor/token_watch.go 130: Failed to update CNI token, retrying... error=serviceaccounts "calico-node" is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system"
Is the role calico-cni-plugin supposed to be allowed to create serviceaccount tokens?
Nope, the calico-cni-plugin serviceaccount should not be able to make tokens. However, calico-node
should be allowed to create tokens for calico-cni-plugin
.
Cluster Version: v3.25.1
This is interesting - it sounds like you're running with the code from Calico v3.25, but the RBAC from Calico v3.26, which would result in the problems you're seeing. The v3.25 code expects to have this RBAC:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-node
rules:
# Used for creating service account tokens to be used by the CNI plugin
- apiGroups: [""]
resources:
- serviceaccounts/token
resourceNames:
- calico-node
verbs:
- create
Where as v3.26 expects this:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-node
rules:
# Used for creating service account tokens to be used by the CNI plugin
- apiGroups: [""]
resources:
- serviceaccounts/token
resourceNames:
- calico-cni-plugin
verbs:
- create
We were able to fix this problem now. Our master node's had an incorrect version which ruined everything. An update of our master node was fortunately the solution without any hacky workarounds. But thanks for the help - I appreciate that!
After some period, Pods cannot create and delete with this message
It seems to be relate with the service account of policy changed from kubernetes
v1.26.0
https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#manual-secret-management-for-serviceaccountsHere is the workaround of solution. re-read calico-node information by restart or delete.
Expected Behavior
kubectl
create
ordelete
is working fine.Current Behavior
It won't work properly
Possible Solution
`Workaround' is restart daemonset or delete pod.
OR
'Possible Solution' is that create a long period secret token for service account instead of this. and use this secret with service account for calico-node. (it is related with #5712 #6421)
Steps to Reproduce (for bugs)
Context
It already applied to the code from #6218
node/pkg/cni/token_watch.go
So I decoded applied JWT on the calico-node. It confirmed 1 year(365d) properly.
JWT
Decoded JWT's Payload
Thus this issue is a little different logic to verify the authorization from kubernetes.
/var/log/message
from all nodes like below when it happened.[control-plane node]
[worker node]
Your Environment