Controller manager does not have permission to execute `uds` FlexVolume plugin

surajssd commented 4 years ago

I had a HA cluster on Packet with no component installed. This was installed at a version: 35814104. Now I updated the lokoctl to version 524a81df and upgraded the cluster and I see endless stream of following logs from controller-manager:

E0819 07:47:53.341836       1 driver-call.go:266] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input
W0819 07:47:53.341879       1 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: fork/exec /var/lib/kubelet/volumeplugins/nodeagent~uds/uds: no such file or directory, output: ""
E0819 07:47:53.341917       1 plugins.go:731] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input

The mitigation is to allow the binary /var/lib/kubelet/volumeplugins/nodeagent~uds/uds to be executable for everyone, right now it is executable for root user and root group only, while controller-manager is running as user nobody AKA UID 65534.

Or to elevate the permissions of the controller-manager pod.

Full logs

```console $ kubectl logs I0819 07:46:35.790967 W0819 07:46:35.791102 I0819 07:46:37.341475 I0819 07:46:37.343185 I0819 07:46:37.343245 I0819 07:46:37.343351 I0819 07:46:37.343399 I0819 07:46:37.344149 I0819 07:46:37.344192 I0819 07:46:37.345643 I0819 07:46:37.345743 I0819 07:46:37.443557 I0819 07:46:37.443582 I0819 07:47:05.814968 I0819 07:47:05.815089 I0819 07:47:05.815170 I0819 07:47:06.132828 I0819 07:47:06.136489 I0819 07:47:06.159987 I0819 07:47:06.160057 I0819 07:47:06.160090 I0819 07:47:06.175246 I0819 07:47:06.175311 I0819 07:47:06.190402 I0819 07:47:06.190445 I0819 07:47:06.190477 I0819 07:47:06.236752 I0819 07:47:06.507800 I0819 07:47:06.507871 I0819 07:47:06.507941 I0819 07:47:06.507983 I0819 07:47:06.508036 I0819 07:47:06.508113 I0819 07:47:06.508199 I0819 07:47:06.508251 I0819 07:47:06.508300 I0819 07:47:06.508340 I0819 07:47:06.508380 I0819 07:47:06.508431 I0819 07:47:06.508504 I0819 07:47:06.508555 I0819 07:47:06.508631 I0819 07:47:06.508682 I0819 07:47:06.508728 I0819 07:47:06.508781 I0819 07:47:06.508846 I0819 07:47:06.508970 I0819 07:47:06.509098 I0819 07:47:06.509153 I0819 07:47:06.509200 I0819 07:47:06.509240 W0819 07:47:06.509264 I0819 07:47:06.509327 I0819 07:47:06.509413 I0819 07:47:06.509481 I0819 07:47:06.525651 I0819 07:47:06.526953 I0819 07:47:06.527331 I0819 07:47:06.527765 I0819 07:47:06.528310 I0819 07:47:06.528344 I0819 07:47:06.545364 I0819 07:47:06.545482 I0819 07:47:06.545521 I0819 07:47:06.567356 I0819 07:47:06.567545 I0819 07:47:06.567580 I0819 07:47:06.599433 I0819 07:47:06.599498 I0819 07:47:06.599533 I0819 07:47:06.615822 I0819 07:47:06.615903 I0819 07:47:06.615921 I0819 07:47:06.632577 I0819 07:47:06.633105 I0819 07:47:06.633136 I0819 07:47:06.648550 I0819 07:47:06.648625 I0819 07:47:06.648661 I0819 07:47:06.672115 I0819 07:47:06.672250 I0819 07:47:06.672284 I0819 07:47:06.689005 W0819 07:47:06.689059 W0819 07:47:06.689083 I0819 07:47:06.689098 W0819 07:47:06.689105 I0819 07:47:06.689117 I0819 07:47:06.742814 I0819 07:47:06.742963 I0819 07:47:06.742987 I0819 07:47:07.648157 I0819 07:47:07.648194 I0819 07:47:07.648238 I0819 07:47:07.648275 I0819 07:47:07.655267 I0819 07:47:07.706637 I0819 07:47:07.706778 I0819 07:47:07.706808 I0819 07:47:07.793421 I0819 07:47:07.793514 I0819 07:47:07.793535 I0819 07:47:07.942864 I0819 07:47:07.942912 I0819 07:47:07.942943 I0819 07:47:08.092406 I0819 07:47:18.099893 I0819 07:47:18.100145 I0819 07:47:18.100224 I0819 07:47:18.100373 I0819 07:47:18.100412 I0819 07:47:18.118328 I0819 07:47:18.118674 I0819 07:47:18.118712 I0819 07:47:18.139874 I0819 07:47:18.139940 W0819 07:47:18.139959 I0819 07:47:18.139982 I0819 07:47:18.156431 I0819 07:47:18.156571 I0819 07:47:18.156609 I0819 07:47:18.171835 I0819 07:47:18.171937 I0819 07:47:18.171977 I0819 07:47:18.189468 I0819 07:47:18.190102 I0819 07:47:18.190138 I0819 07:47:18.190240 I0819 07:47:18.207518 E0819 07:47:18.207575 W0819 07:47:18.207597 I0819 07:47:18.222868 I0819 07:47:18.222975 I0819 07:47:18.223002 I0819 07:47:18.239241 I0819 07:47:18.239392 I0819 07:47:18.239427 I0819 07:47:18.265822 I0819 07:47:18.265875 I0819 07:47:18.265906 E0819 07:47:18.282238 W0819 07:47:18.282272 I0819 07:47:18.406720 I0819 07:47:18.406997 I0819 07:47:18.407057 I0819 07:47:18.407448 I0819 07:47:18.415856 I0819 07:47:18.460294 I0819 07:47:18.466061 I0819 07:47:18.466101 I0819 07:47:18.467903 W0819 07:47:18.468360 I0819 07:47:18.468514 I0819 07:47:18.472226 W0819 07:47:18.473882 I0819 07:47:18.473969 W0819 07:47:18.484522 I0819 07:47:18.484603 I0819 07:47:18.489369 I0819 07:47:18.490334 I0819 07:47:18.493788 I0819 07:47:18.499793 I0819 07:47:18.504279 W0819 07:47:18.505125 I0819 07:47:18.505204 I0819 07:47:18.507137 I0819 07:47:18.508442 I0819 07:47:18.516189 I0819 07:47:18.518974 I0819 07:47:18.521102 I0819 07:47:18.539775 I0819 07:47:18.543225 W0819 07:47:18.545719 I0819 07:47:18.546144 I0819 07:47:18.546243 I0819 07:47:18.548951 I0819 07:47:18.556948 I0819 07:47:18.588783 W0819 07:47:18.626245 I0819 07:47:18.626318 W0819 07:47:18.710943 W0819 07:47:18.711360 W0819 07:47:18.711549 W0819 07:47:18.711619 I0819 07:47:18.723249 I0819 07:47:18.733456 I0819 07:47:18.743198 I0819 07:47:18.745802 I0819 07:47:18.772608 I0819 07:47:18.790697 I0819 07:47:18.791747 I0819 07:47:18.800650 I0819 07:47:18.800819 I0819 07:47:18.800838 I0819 07:47:18.800856 I0819 07:47:19.016276 I0819 07:47:19.028782 I0819 07:47:19.028918 I0819 07:47:19.029087 W0819 07:47:19.029262 I0819 07:47:19.029317 I0819 07:47:19.029382 I0819 07:47:19.029426 I0819 07:47:19.029455 W0819 07:47:19.029345 W0819 07:47:19.029576 W0819 07:47:19.029656 I0819 07:47:19.029714 I0819 07:47:19.048624 I0819 07:47:19.048666 I0819 07:47:19.107755 I0819 07:47:19.109711 I0819 07:47:20.384591 I0819 07:47:37.629926 I0819 07:47:37.647394 I0819 07:47:52.603633 I0819 07:47:52.652965 I0819 07:47:53.327205 I0819 07:47:53.341477 E0819 07:47:53.341836 W0819 07:47:53.341879 E0819 07:47:53.341917 E0819 07:47:54.894035 W0819 07:47:54.894111 E0819 07:47:54.894145 [REDACTED] ``` kube-controller-manager-7d4796b4c8-r4mkp 1 serving.go:313] Generated self-signed cert in-memory 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 1 controllermanager.go:161] Version: v1.18.6 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 secure_serving.go:178] Serving securely on [::]:10257 1 tlsconfig.go:240] Starting DynamicServingCertificateController 1 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-controller-manager... 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 1 leaderelection.go:252] successfully acquired lease kube-system/kube-controller-manager 1 event.go:278] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"668755c7-1d6c-4583-b554-9eaabebb066c", APIVersion:"v1", ResourceVersion:"6670", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-controller-manager-7d4796b4c8-r4mkp_def672fa-24f2-4e24-846b-58c60f20838f became leader 1 event.go:278] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"kube-controller-manager", UID:"183a9579-58c8-47c1-bd1d-b0ae6492c88e", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"6671", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-controller-manager-7d4796b4c8-r4mkp_def672fa-24f2-4e24-846b-58c60f20838f became leader 1 plugins.go:100] No cloud provider specified. 1 shared_informer.go:223] Waiting for caches to sync for tokens 1 controllermanager.go:533] Started "statefulset" 1 stateful_set.go:146] Starting stateful set controller 1 shared_informer.go:223] Waiting for caches to sync for stateful set 1 controllermanager.go:533] Started "csrcleaner" 1 cleaner.go:82] Starting CSR cleaner controller 1 controllermanager.go:533] Started "endpoint" 1 endpoints_controller.go:182] Starting endpoint controller 1 shared_informer.go:223] Waiting for caches to sync for endpoint 1 shared_informer.go:230] Caches are synced for tokens 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for daemonsets.apps 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for deployments.apps 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for rolebindings.rbac.authorization.k8s.io 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for statefulsets.apps 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for endpointslices.discovery.k8s.io 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for podtemplates 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for cronjobs.batch 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.networking.k8s.io 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for leases.coordination.k8s.io 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for endpoints 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for controllerrevisions.apps 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for horizontalpodautoscalers.autoscaling 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.extensions 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for roles.rbac.authorization.k8s.io 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networkpolicies.crd.projectcalico.org 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networksets.crd.projectcalico.org 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networkpolicies.networking.k8s.io 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for poddisruptionbudgets.policy 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for replicasets.apps 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for serviceaccounts 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for limitranges 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for events.events.k8s.io 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for jobs.batch 1 controllermanager.go:533] Started "resourcequota" 1 controllermanager.go:512] "tokencleaner" is disabled 1 resource_quota_controller.go:272] Starting resource quota controller 1 shared_informer.go:223] Waiting for caches to sync for resource quota 1 resource_quota_monitor.go:303] QuotaMonitor running 1 node_lifecycle_controller.go:384] Sending events to api server. 1 taint_manager.go:163] Sending events to api server. 1 node_lifecycle_controller.go:512] Controller will reconcile labels. 1 controllermanager.go:533] Started "nodelifecycle" 1 node_lifecycle_controller.go:546] Starting node controller 1 shared_informer.go:223] Waiting for caches to sync for taint 1 controllermanager.go:533] Started "attachdetach" 1 attach_detach_controller.go:348] Starting attach detach controller 1 shared_informer.go:223] Waiting for caches to sync for attach detach 1 controllermanager.go:533] Started "endpointslice" 1 endpointslice_controller.go:213] Starting endpoint slice controller 1 shared_informer.go:223] Waiting for caches to sync for endpoint_slice 1 controllermanager.go:533] Started "namespace" 1 namespace_controller.go:200] Starting namespace controller 1 shared_informer.go:223] Waiting for caches to sync for namespace 1 controllermanager.go:533] Started "pv-protection" 1 pv_protection_controller.go:83] Starting PV protection controller 1 shared_informer.go:223] Waiting for caches to sync for PV protection 1 controllermanager.go:533] Started "podgc" 1 gc_controller.go:89] Starting GC controller 1 shared_informer.go:223] Waiting for caches to sync for GC 1 controllermanager.go:533] Started "serviceaccount" 1 serviceaccounts_controller.go:117] Starting service account controller 1 shared_informer.go:223] Waiting for caches to sync for service account 1 controllermanager.go:533] Started "daemonset" 1 daemon_controller.go:285] Starting daemon sets controller 1 shared_informer.go:223] Waiting for caches to sync for daemon sets 1 controllermanager.go:533] Started "deployment" 1 controllermanager.go:512] "bootstrapsigner" is disabled 1 controllermanager.go:525] Skipping "ttl-after-finished" 1 deployment_controller.go:153] Starting deployment controller 1 controllermanager.go:525] Skipping "root-ca-cert-publisher" 1 shared_informer.go:223] Waiting for caches to sync for deployment 1 controllermanager.go:533] Started "replicationcontroller" 1 replica_set.go:181] Starting replicationcontroller controller 1 shared_informer.go:223] Waiting for caches to sync for ReplicationController 1 garbagecollector.go:133] Starting garbage collector controller 1 shared_informer.go:223] Waiting for caches to sync for garbage collector 1 graph_builder.go:282] GraphBuilder running 1 controllermanager.go:533] Started "garbagecollector" 1 request.go:621] Throttling request took 1.047466582s, request: GET:https://10.3.0.1:443/apis/rbac.authorization.k8s.io/v1?timeout=32s 1 controllermanager.go:533] Started "horizontalpodautoscaling" 1 horizontal.go:169] Starting HPA controller 1 shared_informer.go:223] Waiting for caches to sync for HPA 1 controllermanager.go:533] Started "csrapproving" 1 certificate_controller.go:119] Starting certificate controller "csrapproving" 1 shared_informer.go:223] Waiting for caches to sync for certificate-csrapproving 1 controllermanager.go:533] Started "ttl" 1 ttl_controller.go:118] Starting TTL controller 1 shared_informer.go:223] Waiting for caches to sync for TTL 1 node_ipam_controller.go:94] Sending events to api server. 1 range_allocator.go:82] Sending events to api server. 1 range_allocator.go:116] No Secondary Service CIDR provided. Skipping filtering out secondary service addresses. 1 controllermanager.go:533] Started "nodeipam" 1 node_ipam_controller.go:162] Starting ipam controller 1 shared_informer.go:223] Waiting for caches to sync for node 1 controllermanager.go:533] Started "clusterrole-aggregation" 1 clusterroleaggregation_controller.go:149] Starting ClusterRoleAggregator 1 shared_informer.go:223] Waiting for caches to sync for ClusterRoleAggregator 1 controllermanager.go:533] Started "cronjob" 1 core.go:239] Will not configure cloud provider routes for allocate-node-cidrs: true, configure-cloud-routes: false. 1 controllermanager.go:525] Skipping "route" 1 cronjob_controller.go:97] Starting CronJob Manager 1 controllermanager.go:533] Started "job" 1 job_controller.go:144] Starting job controller 1 shared_informer.go:223] Waiting for caches to sync for job 1 controllermanager.go:533] Started "replicaset" 1 replica_set.go:181] Starting replicaset controller 1 shared_informer.go:223] Waiting for caches to sync for ReplicaSet 1 controllermanager.go:533] Started "csrsigning" 1 certificate_controller.go:119] Starting certificate controller "csrsigning" 1 shared_informer.go:223] Waiting for caches to sync for certificate-csrsigning 1 dynamic_serving_content.go:130] Starting csr-controller::/etc/kubernetes/secrets/ca.crt::/etc/kubernetes/secrets/ca.key 1 node_lifecycle_controller.go:78] Sending events to api server 1 core.go:229] failed to start cloud node lifecycle controller: no cloud provider provided 1 controllermanager.go:525] Skipping "cloud-node-lifecycle" 1 controllermanager.go:533] Started "persistentvolume-binder" 1 pv_controller_base.go:295] Starting persistent volume controller 1 shared_informer.go:223] Waiting for caches to sync for persistent volume 1 controllermanager.go:533] Started "persistentvolume-expander" 1 expand_controller.go:319] Starting expand controller 1 shared_informer.go:223] Waiting for caches to sync for expand 1 controllermanager.go:533] Started "disruption" 1 disruption.go:331] Starting disruption controller 1 shared_informer.go:223] Waiting for caches to sync for disruption 1 core.go:89] Failed to start service controller: WARNING: no cloud provider provided, services of type LoadBalancer will fail 1 controllermanager.go:525] Skipping "service" 1 controllermanager.go:533] Started "pvc-protection" 1 pvc_protection_controller.go:101] Starting PVC protection controller 1 shared_informer.go:223] Waiting for caches to sync for PVC protection 1 shared_informer.go:223] Waiting for caches to sync for resource quota 1 shared_informer.go:223] Waiting for caches to sync for garbage collector 1 shared_informer.go:230] Caches are synced for stateful set 1 shared_informer.go:230] Caches are synced for disruption 1 disruption.go:339] Sending events to api server. 1 shared_informer.go:230] Caches are synced for endpoint_slice 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found 1 shared_informer.go:230] Caches are synced for ReplicaSet 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found 1 shared_informer.go:230] Caches are synced for deployment 1 shared_informer.go:230] Caches are synced for certificate-csrsigning 1 shared_informer.go:230] Caches are synced for certificate-csrapproving 1 shared_informer.go:230] Caches are synced for namespace 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"kube-scheduler", UID:"4f9aea19-4b95-46e1-903d-25db83c3d17d", APIVersion:"apps/v1", ResourceVersion:"6477", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set kube-scheduler-6d9578bd6 to 1 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found 1 shared_informer.go:230] Caches are synced for HPA 1 shared_informer.go:230] Caches are synced for PVC protection 1 shared_informer.go:230] Caches are synced for PV protection 1 shared_informer.go:230] Caches are synced for ClusterRoleAggregator 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"kube-scheduler-6d9578bd6", UID:"71743313-9430-40da-bb6e-9cff2dc7472e", APIVersion:"apps/v1", ResourceVersion:"6715", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: kube-scheduler-6d9578bd6-scsl9 1 shared_informer.go:230] Caches are synced for expand 1 shared_informer.go:230] Caches are synced for ReplicationController 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"kube-scheduler", UID:"4f9aea19-4b95-46e1-903d-25db83c3d17d", APIVersion:"apps/v1", ResourceVersion:"6477", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set kube-scheduler-59d7c76679 to 3 1 shared_informer.go:230] Caches are synced for service account 1 shared_informer.go:230] Caches are synced for job 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"kube-scheduler-59d7c76679", UID:"7d2c25e3-13bd-4d45-93c4-bbcf361830df", APIVersion:"apps/v1", ResourceVersion:"6725", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-scheduler-59d7c76679-7br6h 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-controller-0" does not exist 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-controller-1" does not exist 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-controller-2" does not exist 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-openebs-worker-0" does not exist 1 shared_informer.go:230] Caches are synced for persistent volume 1 shared_informer.go:230] Caches are synced for GC 1 shared_informer.go:230] Caches are synced for TTL 1 shared_informer.go:230] Caches are synced for attach detach 1 shared_informer.go:230] Caches are synced for daemon sets 1 shared_informer.go:230] Caches are synced for endpoint 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kube-proxy", UID:"71c660f6-69d7-4199-b641-c5bd87d07ba3", APIVersion:"apps/v1", ResourceVersion:"6549", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: kube-proxy-4xgzs 1 shared_informer.go:230] Caches are synced for node 1 range_allocator.go:172] Starting range CIDR allocator 1 shared_informer.go:223] Waiting for caches to sync for cidrallocator 1 shared_informer.go:230] Caches are synced for cidrallocator 1 shared_informer.go:230] Caches are synced for garbage collector 1 shared_informer.go:230] Caches are synced for taint 1 taint_manager.go:187] Starting NoExecuteTaintManager 1 node_lifecycle_controller.go:1433] Initializing eviction metric for zone: 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-openebs-worker-0. Assuming now as a timestamp. 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-controller-0", UID:"7dd33cdf-68b9-443a-9c80-7ba92a0a7778", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-controller-0 event: Registered Node new-cluster-controller-0 in Controller 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-controller-1", UID:"b2305392-ed5c-4d52-8192-eea6dbeb684a", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-controller-1 event: Registered Node new-cluster-controller-1 in Controller 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-openebs-worker-0", UID:"9769b4e4-a2b0-46ba-8b6e-21bb2e8d6263", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-openebs-worker-0 event: Registered Node new-cluster-openebs-worker-0 in Controller 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-controller-2", UID:"cc4d9534-aab2-43e2-a240-d2d71179fa04", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-controller-2 event: Registered Node new-cluster-controller-2 in Controller 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-controller-0. Assuming now as a timestamp. 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-controller-1. Assuming now as a timestamp. 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-controller-2. Assuming now as a timestamp. 1 node_lifecycle_controller.go:1249] Controller detected that zone is now in state Normal. 1 shared_informer.go:230] Caches are synced for garbage collector 1 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage 1 shared_informer.go:230] Caches are synced for resource quota 1 shared_informer.go:230] Caches are synced for resource quota 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kube-proxy", UID:"71c660f6-69d7-4199-b641-c5bd87d07ba3", APIVersion:"apps/v1", ResourceVersion:"6750", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-proxy-r69ng 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"kube-scheduler", UID:"4f9aea19-4b95-46e1-903d-25db83c3d17d", APIVersion:"apps/v1", ResourceVersion:"6736", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set kube-scheduler-6d9578bd6 to 0 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"kube-scheduler-6d9578bd6", UID:"71743313-9430-40da-bb6e-9cff2dc7472e", APIVersion:"apps/v1", ResourceVersion:"6823", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: kube-scheduler-6d9578bd6-mgrct 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"calico-node", UID:"a4212caf-5dfc-4e70-acae-b1fdab8ac39c", APIVersion:"apps/v1", ResourceVersion:"6903", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: calico-node-pmg7j 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"calico-node", UID:"a4212caf-5dfc-4e70-acae-b1fdab8ac39c", APIVersion:"apps/v1", ResourceVersion:"6910", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: calico-node-bhhsq 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"calico-kube-controllers", UID:"cca53957-f09f-4c5a-afa5-1b247f35d640", APIVersion:"apps/v1", ResourceVersion:"6917", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set calico-kube-controllers-68c54c6b44 to 0 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"calico-kube-controllers-68c54c6b44", UID:"c141f89e-a964-416e-94d2-4bc6ab8bfa6e", APIVersion:"apps/v1", ResourceVersion:"6918", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: calico-kube-controllers-68c54c6b44-dbd76 1 driver-call.go:266] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input 1 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: fork/exec /var/lib/kubelet/volumeplugins/nodeagent~uds/uds: no such file or directory, output: "" 1 plugins.go:731] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input 1 driver-call.go:266] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input 1 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: fork/exec /var/lib/kubelet/volumeplugins/nodeagent~uds/uds: no such file or directory, output: "" 1 plugins.go:731] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input

rata commented 4 years ago

@surajssd not sure I follow. There was some configuration that was working with one version, changed lokoctl to a newer version run lokoctl cluster apply and that directory is missing? Or how would the steps be to reproduce it? If you can paste the full steps to reproduce, it would be great :)

Do you have the ignition config that installed the /var/lib/kubelet/volumeplugins/nodeagent~uds/uds? Do you know what is that and how it is used in that setup?

surajssd commented 4 years ago

Create a 3 controller cluster on Packet with lokoctl built at commit 35814104f4ea9df18055aecd9546c3dbb19c334d.
Once above is up and running. Build lokoctl on version 524a81dfd64ebc15646e78433ed0dc7e0ed2cadd.
Run lokoctl cluster apply.
Check controller manager logs, specially from the one that is active or elected leader.

lokocfg file:

cluster "packet" {
  auth_token     = var.packet_token

  project_id   = var.packet_project_id
  cluster_name = var.cluster_name
  facility     = var.facility

  controller_type = "t1.small.x86"

  asset_dir        = "./assets"
  controller_count = 3
  ssh_pubkeys = [
  ]

  management_cidrs  = ["0.0.0.0/0"]
  node_private_cidr = "10.0.0.0/8"

  disable_self_hosted_kubelet = true

  dns {
    zone = var.route53_zone
    provider = "route53"
  }

  worker_pool "foobar" {
    count = 1
    node_type = "c2.medium.x86"
  }
}

invidian commented 4 years ago

@rata those plugins are usually installed by a DaemonSet from storage solution. I've never experienced kube-controller-manager to actually execute them though. Usually only kubelet does that...

rata commented 4 years ago

@surajssd that is it to reproduce? No component for flex volumes nor anything? Cool, seems simpler to debug. Still very surprising, but let's see :)

surajssd commented 4 years ago

@surajssd that is it to reproduce? No component for flex volumes nor anything? Cool, seems simpler to debug. Still very surprising, but let's see :)

That's all is needed, hence I was surprised as well, at such a finding.

iaguis commented 4 years ago

Might help: https://github.com/projectcalico/pod2daemon/issues/20 and https://github.com/projectcalico/calico/issues/2771

surajssd commented 3 years ago

Right now the problem is that this plugin is allowed to be executed only by user and group and has no permissions for other users:

# ls -al
total 5272
drwxr-xr-x. 2 root root    4096 Mar 26 09:21 .
drwxr-xr-x. 3 root root    4096 Mar 26 08:14 ..
-r-xr-x---. 1 root root 5374554 Mar 26 09:21 uds

But we run controller-manager process with UID 65534. Hence there are endless stream of logs.

~Resolution~ Workaround

Run this command on all the controller nodes:

sudo chmod +x /var/lib/kubelet/volumeplugins/nodeagent~uds/uds

Then restart (kill one and it will come up again) the kube-controller-manager pods one at a time.

invidian commented 3 years ago

Resolution

This sounds a lot like a workaround, not as a resolution to me :smile:

surajssd commented 3 years ago

Edited :laughing:

kinvolk / lokomotive

Controller manager does not have permission to execute `uds` FlexVolume plugin #813

~Resolution~ Workaround