kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
321 stars 49 forks source link

Controller manager does not have permission to execute `uds` FlexVolume plugin #813

Open surajssd opened 4 years ago

surajssd commented 4 years ago

I had a HA cluster on Packet with no component installed. This was installed at a version: 35814104. Now I updated the lokoctl to version 524a81df and upgraded the cluster and I see endless stream of following logs from controller-manager:

E0819 07:47:53.341836       1 driver-call.go:266] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input
W0819 07:47:53.341879       1 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: fork/exec /var/lib/kubelet/volumeplugins/nodeagent~uds/uds: no such file or directory, output: ""
E0819 07:47:53.341917       1 plugins.go:731] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input

The mitigation is to allow the binary /var/lib/kubelet/volumeplugins/nodeagent~uds/uds to be executable for everyone, right now it is executable for root user and root group only, while controller-manager is running as user nobody AKA UID 65534.

Or to elevate the permissions of the controller-manager pod.

Full logs

```console $ kubectl logs kube-controller-manager-7d4796b4c8-r4mkp I0819 07:46:35.790967 1 serving.go:313] Generated self-signed cert in-memory W0819 07:46:35.791102 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0819 07:46:37.341475 1 controllermanager.go:161] Version: v1.18.6 I0819 07:46:37.343185 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0819 07:46:37.343245 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0819 07:46:37.343351 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0819 07:46:37.343399 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0819 07:46:37.344149 1 secure_serving.go:178] Serving securely on [::]:10257 I0819 07:46:37.344192 1 tlsconfig.go:240] Starting DynamicServingCertificateController I0819 07:46:37.345643 1 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252 I0819 07:46:37.345743 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-controller-manager... I0819 07:46:37.443557 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0819 07:46:37.443582 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0819 07:47:05.814968 1 leaderelection.go:252] successfully acquired lease kube-system/kube-controller-manager I0819 07:47:05.815089 1 event.go:278] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"668755c7-1d6c-4583-b554-9eaabebb066c", APIVersion:"v1", ResourceVersion:"6670", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-controller-manager-7d4796b4c8-r4mkp_def672fa-24f2-4e24-846b-58c60f20838f became leader I0819 07:47:05.815170 1 event.go:278] Event(v1.ObjectReference{Kind:"Lease", Namespace:"kube-system", Name:"kube-controller-manager", UID:"183a9579-58c8-47c1-bd1d-b0ae6492c88e", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"6671", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' kube-controller-manager-7d4796b4c8-r4mkp_def672fa-24f2-4e24-846b-58c60f20838f became leader I0819 07:47:06.132828 1 plugins.go:100] No cloud provider specified. I0819 07:47:06.136489 1 shared_informer.go:223] Waiting for caches to sync for tokens I0819 07:47:06.159987 1 controllermanager.go:533] Started "statefulset" I0819 07:47:06.160057 1 stateful_set.go:146] Starting stateful set controller I0819 07:47:06.160090 1 shared_informer.go:223] Waiting for caches to sync for stateful set I0819 07:47:06.175246 1 controllermanager.go:533] Started "csrcleaner" I0819 07:47:06.175311 1 cleaner.go:82] Starting CSR cleaner controller I0819 07:47:06.190402 1 controllermanager.go:533] Started "endpoint" I0819 07:47:06.190445 1 endpoints_controller.go:182] Starting endpoint controller I0819 07:47:06.190477 1 shared_informer.go:223] Waiting for caches to sync for endpoint I0819 07:47:06.236752 1 shared_informer.go:230] Caches are synced for tokens I0819 07:47:06.507800 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for daemonsets.apps I0819 07:47:06.507871 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for deployments.apps I0819 07:47:06.507941 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for rolebindings.rbac.authorization.k8s.io I0819 07:47:06.507983 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for statefulsets.apps I0819 07:47:06.508036 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for endpointslices.discovery.k8s.io I0819 07:47:06.508113 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for podtemplates I0819 07:47:06.508199 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for cronjobs.batch I0819 07:47:06.508251 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.networking.k8s.io I0819 07:47:06.508300 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for leases.coordination.k8s.io I0819 07:47:06.508340 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for endpoints I0819 07:47:06.508380 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for controllerrevisions.apps I0819 07:47:06.508431 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for horizontalpodautoscalers.autoscaling I0819 07:47:06.508504 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for ingresses.extensions I0819 07:47:06.508555 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for roles.rbac.authorization.k8s.io I0819 07:47:06.508631 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networkpolicies.crd.projectcalico.org I0819 07:47:06.508682 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networksets.crd.projectcalico.org I0819 07:47:06.508728 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for networkpolicies.networking.k8s.io I0819 07:47:06.508781 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for poddisruptionbudgets.policy I0819 07:47:06.508846 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for replicasets.apps I0819 07:47:06.508970 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for serviceaccounts I0819 07:47:06.509098 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for limitranges I0819 07:47:06.509153 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for events.events.k8s.io I0819 07:47:06.509200 1 resource_quota_monitor.go:228] QuotaMonitor created object count evaluator for jobs.batch I0819 07:47:06.509240 1 controllermanager.go:533] Started "resourcequota" W0819 07:47:06.509264 1 controllermanager.go:512] "tokencleaner" is disabled I0819 07:47:06.509327 1 resource_quota_controller.go:272] Starting resource quota controller I0819 07:47:06.509413 1 shared_informer.go:223] Waiting for caches to sync for resource quota I0819 07:47:06.509481 1 resource_quota_monitor.go:303] QuotaMonitor running I0819 07:47:06.525651 1 node_lifecycle_controller.go:384] Sending events to api server. I0819 07:47:06.526953 1 taint_manager.go:163] Sending events to api server. I0819 07:47:06.527331 1 node_lifecycle_controller.go:512] Controller will reconcile labels. I0819 07:47:06.527765 1 controllermanager.go:533] Started "nodelifecycle" I0819 07:47:06.528310 1 node_lifecycle_controller.go:546] Starting node controller I0819 07:47:06.528344 1 shared_informer.go:223] Waiting for caches to sync for taint I0819 07:47:06.545364 1 controllermanager.go:533] Started "attachdetach" I0819 07:47:06.545482 1 attach_detach_controller.go:348] Starting attach detach controller I0819 07:47:06.545521 1 shared_informer.go:223] Waiting for caches to sync for attach detach I0819 07:47:06.567356 1 controllermanager.go:533] Started "endpointslice" I0819 07:47:06.567545 1 endpointslice_controller.go:213] Starting endpoint slice controller I0819 07:47:06.567580 1 shared_informer.go:223] Waiting for caches to sync for endpoint_slice I0819 07:47:06.599433 1 controllermanager.go:533] Started "namespace" I0819 07:47:06.599498 1 namespace_controller.go:200] Starting namespace controller I0819 07:47:06.599533 1 shared_informer.go:223] Waiting for caches to sync for namespace I0819 07:47:06.615822 1 controllermanager.go:533] Started "pv-protection" I0819 07:47:06.615903 1 pv_protection_controller.go:83] Starting PV protection controller I0819 07:47:06.615921 1 shared_informer.go:223] Waiting for caches to sync for PV protection I0819 07:47:06.632577 1 controllermanager.go:533] Started "podgc" I0819 07:47:06.633105 1 gc_controller.go:89] Starting GC controller I0819 07:47:06.633136 1 shared_informer.go:223] Waiting for caches to sync for GC I0819 07:47:06.648550 1 controllermanager.go:533] Started "serviceaccount" I0819 07:47:06.648625 1 serviceaccounts_controller.go:117] Starting service account controller I0819 07:47:06.648661 1 shared_informer.go:223] Waiting for caches to sync for service account I0819 07:47:06.672115 1 controllermanager.go:533] Started "daemonset" I0819 07:47:06.672250 1 daemon_controller.go:285] Starting daemon sets controller I0819 07:47:06.672284 1 shared_informer.go:223] Waiting for caches to sync for daemon sets I0819 07:47:06.689005 1 controllermanager.go:533] Started "deployment" W0819 07:47:06.689059 1 controllermanager.go:512] "bootstrapsigner" is disabled W0819 07:47:06.689083 1 controllermanager.go:525] Skipping "ttl-after-finished" I0819 07:47:06.689098 1 deployment_controller.go:153] Starting deployment controller W0819 07:47:06.689105 1 controllermanager.go:525] Skipping "root-ca-cert-publisher" I0819 07:47:06.689117 1 shared_informer.go:223] Waiting for caches to sync for deployment I0819 07:47:06.742814 1 controllermanager.go:533] Started "replicationcontroller" I0819 07:47:06.742963 1 replica_set.go:181] Starting replicationcontroller controller I0819 07:47:06.742987 1 shared_informer.go:223] Waiting for caches to sync for ReplicationController I0819 07:47:07.648157 1 garbagecollector.go:133] Starting garbage collector controller I0819 07:47:07.648194 1 shared_informer.go:223] Waiting for caches to sync for garbage collector I0819 07:47:07.648238 1 graph_builder.go:282] GraphBuilder running I0819 07:47:07.648275 1 controllermanager.go:533] Started "garbagecollector" I0819 07:47:07.655267 1 request.go:621] Throttling request took 1.047466582s, request: GET:https://10.3.0.1:443/apis/rbac.authorization.k8s.io/v1?timeout=32s I0819 07:47:07.706637 1 controllermanager.go:533] Started "horizontalpodautoscaling" I0819 07:47:07.706778 1 horizontal.go:169] Starting HPA controller I0819 07:47:07.706808 1 shared_informer.go:223] Waiting for caches to sync for HPA I0819 07:47:07.793421 1 controllermanager.go:533] Started "csrapproving" I0819 07:47:07.793514 1 certificate_controller.go:119] Starting certificate controller "csrapproving" I0819 07:47:07.793535 1 shared_informer.go:223] Waiting for caches to sync for certificate-csrapproving I0819 07:47:07.942864 1 controllermanager.go:533] Started "ttl" I0819 07:47:07.942912 1 ttl_controller.go:118] Starting TTL controller I0819 07:47:07.942943 1 shared_informer.go:223] Waiting for caches to sync for TTL I0819 07:47:08.092406 1 node_ipam_controller.go:94] Sending events to api server. I0819 07:47:18.099893 1 range_allocator.go:82] Sending events to api server. I0819 07:47:18.100145 1 range_allocator.go:116] No Secondary Service CIDR provided. Skipping filtering out secondary service addresses. I0819 07:47:18.100224 1 controllermanager.go:533] Started "nodeipam" I0819 07:47:18.100373 1 node_ipam_controller.go:162] Starting ipam controller I0819 07:47:18.100412 1 shared_informer.go:223] Waiting for caches to sync for node I0819 07:47:18.118328 1 controllermanager.go:533] Started "clusterrole-aggregation" I0819 07:47:18.118674 1 clusterroleaggregation_controller.go:149] Starting ClusterRoleAggregator I0819 07:47:18.118712 1 shared_informer.go:223] Waiting for caches to sync for ClusterRoleAggregator I0819 07:47:18.139874 1 controllermanager.go:533] Started "cronjob" I0819 07:47:18.139940 1 core.go:239] Will not configure cloud provider routes for allocate-node-cidrs: true, configure-cloud-routes: false. W0819 07:47:18.139959 1 controllermanager.go:525] Skipping "route" I0819 07:47:18.139982 1 cronjob_controller.go:97] Starting CronJob Manager I0819 07:47:18.156431 1 controllermanager.go:533] Started "job" I0819 07:47:18.156571 1 job_controller.go:144] Starting job controller I0819 07:47:18.156609 1 shared_informer.go:223] Waiting for caches to sync for job I0819 07:47:18.171835 1 controllermanager.go:533] Started "replicaset" I0819 07:47:18.171937 1 replica_set.go:181] Starting replicaset controller I0819 07:47:18.171977 1 shared_informer.go:223] Waiting for caches to sync for ReplicaSet I0819 07:47:18.189468 1 controllermanager.go:533] Started "csrsigning" I0819 07:47:18.190102 1 certificate_controller.go:119] Starting certificate controller "csrsigning" I0819 07:47:18.190138 1 shared_informer.go:223] Waiting for caches to sync for certificate-csrsigning I0819 07:47:18.190240 1 dynamic_serving_content.go:130] Starting csr-controller::/etc/kubernetes/secrets/ca.crt::/etc/kubernetes/secrets/ca.key I0819 07:47:18.207518 1 node_lifecycle_controller.go:78] Sending events to api server E0819 07:47:18.207575 1 core.go:229] failed to start cloud node lifecycle controller: no cloud provider provided W0819 07:47:18.207597 1 controllermanager.go:525] Skipping "cloud-node-lifecycle" I0819 07:47:18.222868 1 controllermanager.go:533] Started "persistentvolume-binder" I0819 07:47:18.222975 1 pv_controller_base.go:295] Starting persistent volume controller I0819 07:47:18.223002 1 shared_informer.go:223] Waiting for caches to sync for persistent volume I0819 07:47:18.239241 1 controllermanager.go:533] Started "persistentvolume-expander" I0819 07:47:18.239392 1 expand_controller.go:319] Starting expand controller I0819 07:47:18.239427 1 shared_informer.go:223] Waiting for caches to sync for expand I0819 07:47:18.265822 1 controllermanager.go:533] Started "disruption" I0819 07:47:18.265875 1 disruption.go:331] Starting disruption controller I0819 07:47:18.265906 1 shared_informer.go:223] Waiting for caches to sync for disruption E0819 07:47:18.282238 1 core.go:89] Failed to start service controller: WARNING: no cloud provider provided, services of type LoadBalancer will fail W0819 07:47:18.282272 1 controllermanager.go:525] Skipping "service" I0819 07:47:18.406720 1 controllermanager.go:533] Started "pvc-protection" I0819 07:47:18.406997 1 pvc_protection_controller.go:101] Starting PVC protection controller I0819 07:47:18.407057 1 shared_informer.go:223] Waiting for caches to sync for PVC protection I0819 07:47:18.407448 1 shared_informer.go:223] Waiting for caches to sync for resource quota I0819 07:47:18.415856 1 shared_informer.go:223] Waiting for caches to sync for garbage collector I0819 07:47:18.460294 1 shared_informer.go:230] Caches are synced for stateful set I0819 07:47:18.466061 1 shared_informer.go:230] Caches are synced for disruption I0819 07:47:18.466101 1 disruption.go:339] Sending events to api server. I0819 07:47:18.467903 1 shared_informer.go:230] Caches are synced for endpoint_slice W0819 07:47:18.468360 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found I0819 07:47:18.468514 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found I0819 07:47:18.472226 1 shared_informer.go:230] Caches are synced for ReplicaSet W0819 07:47:18.473882 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found I0819 07:47:18.473969 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found W0819 07:47:18.484522 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found I0819 07:47:18.484603 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found I0819 07:47:18.489369 1 shared_informer.go:230] Caches are synced for deployment I0819 07:47:18.490334 1 shared_informer.go:230] Caches are synced for certificate-csrsigning I0819 07:47:18.493788 1 shared_informer.go:230] Caches are synced for certificate-csrapproving I0819 07:47:18.499793 1 shared_informer.go:230] Caches are synced for namespace I0819 07:47:18.504279 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"kube-scheduler", UID:"4f9aea19-4b95-46e1-903d-25db83c3d17d", APIVersion:"apps/v1", ResourceVersion:"6477", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set kube-scheduler-6d9578bd6 to 1 W0819 07:47:18.505125 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found I0819 07:47:18.505204 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found I0819 07:47:18.507137 1 shared_informer.go:230] Caches are synced for HPA I0819 07:47:18.508442 1 shared_informer.go:230] Caches are synced for PVC protection I0819 07:47:18.516189 1 shared_informer.go:230] Caches are synced for PV protection I0819 07:47:18.518974 1 shared_informer.go:230] Caches are synced for ClusterRoleAggregator I0819 07:47:18.521102 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"kube-scheduler-6d9578bd6", UID:"71743313-9430-40da-bb6e-9cff2dc7472e", APIVersion:"apps/v1", ResourceVersion:"6715", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: kube-scheduler-6d9578bd6-scsl9 I0819 07:47:18.539775 1 shared_informer.go:230] Caches are synced for expand I0819 07:47:18.543225 1 shared_informer.go:230] Caches are synced for ReplicationController W0819 07:47:18.545719 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found I0819 07:47:18.546144 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found I0819 07:47:18.546243 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"kube-scheduler", UID:"4f9aea19-4b95-46e1-903d-25db83c3d17d", APIVersion:"apps/v1", ResourceVersion:"6477", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set kube-scheduler-59d7c76679 to 3 I0819 07:47:18.548951 1 shared_informer.go:230] Caches are synced for service account I0819 07:47:18.556948 1 shared_informer.go:230] Caches are synced for job I0819 07:47:18.588783 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"kube-scheduler-59d7c76679", UID:"7d2c25e3-13bd-4d45-93c4-bbcf361830df", APIVersion:"apps/v1", ResourceVersion:"6725", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-scheduler-59d7c76679-7br6h W0819 07:47:18.626245 1 endpointslice_controller.go:260] Error syncing endpoint slices for service "kube-system/coredns", retrying. Error: node "new-cluster-controller-0" not found I0819 07:47:18.626318 1 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"kube-system", Name:"coredns", UID:"d7f4d771-abc5-425c-82f1-06219c2f9f64", APIVersion:"v1", ResourceVersion:"6276", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpointSlices' Error updating Endpoint Slices for Service kube-system/coredns: node "new-cluster-controller-0" not found W0819 07:47:18.710943 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-controller-0" does not exist W0819 07:47:18.711360 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-controller-1" does not exist W0819 07:47:18.711549 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-controller-2" does not exist W0819 07:47:18.711619 1 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="new-cluster-openebs-worker-0" does not exist I0819 07:47:18.723249 1 shared_informer.go:230] Caches are synced for persistent volume I0819 07:47:18.733456 1 shared_informer.go:230] Caches are synced for GC I0819 07:47:18.743198 1 shared_informer.go:230] Caches are synced for TTL I0819 07:47:18.745802 1 shared_informer.go:230] Caches are synced for attach detach I0819 07:47:18.772608 1 shared_informer.go:230] Caches are synced for daemon sets I0819 07:47:18.790697 1 shared_informer.go:230] Caches are synced for endpoint I0819 07:47:18.791747 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kube-proxy", UID:"71c660f6-69d7-4199-b641-c5bd87d07ba3", APIVersion:"apps/v1", ResourceVersion:"6549", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: kube-proxy-4xgzs I0819 07:47:18.800650 1 shared_informer.go:230] Caches are synced for node I0819 07:47:18.800819 1 range_allocator.go:172] Starting range CIDR allocator I0819 07:47:18.800838 1 shared_informer.go:223] Waiting for caches to sync for cidrallocator I0819 07:47:18.800856 1 shared_informer.go:230] Caches are synced for cidrallocator I0819 07:47:19.016276 1 shared_informer.go:230] Caches are synced for garbage collector I0819 07:47:19.028782 1 shared_informer.go:230] Caches are synced for taint I0819 07:47:19.028918 1 taint_manager.go:187] Starting NoExecuteTaintManager I0819 07:47:19.029087 1 node_lifecycle_controller.go:1433] Initializing eviction metric for zone: W0819 07:47:19.029262 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-openebs-worker-0. Assuming now as a timestamp. I0819 07:47:19.029317 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-controller-0", UID:"7dd33cdf-68b9-443a-9c80-7ba92a0a7778", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-controller-0 event: Registered Node new-cluster-controller-0 in Controller I0819 07:47:19.029382 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-controller-1", UID:"b2305392-ed5c-4d52-8192-eea6dbeb684a", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-controller-1 event: Registered Node new-cluster-controller-1 in Controller I0819 07:47:19.029426 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-openebs-worker-0", UID:"9769b4e4-a2b0-46ba-8b6e-21bb2e8d6263", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-openebs-worker-0 event: Registered Node new-cluster-openebs-worker-0 in Controller I0819 07:47:19.029455 1 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"new-cluster-controller-2", UID:"cc4d9534-aab2-43e2-a240-d2d71179fa04", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node new-cluster-controller-2 event: Registered Node new-cluster-controller-2 in Controller W0819 07:47:19.029345 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-controller-0. Assuming now as a timestamp. W0819 07:47:19.029576 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-controller-1. Assuming now as a timestamp. W0819 07:47:19.029656 1 node_lifecycle_controller.go:1048] Missing timestamp for Node new-cluster-controller-2. Assuming now as a timestamp. I0819 07:47:19.029714 1 node_lifecycle_controller.go:1249] Controller detected that zone is now in state Normal. I0819 07:47:19.048624 1 shared_informer.go:230] Caches are synced for garbage collector I0819 07:47:19.048666 1 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage I0819 07:47:19.107755 1 shared_informer.go:230] Caches are synced for resource quota I0819 07:47:19.109711 1 shared_informer.go:230] Caches are synced for resource quota I0819 07:47:20.384591 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kube-proxy", UID:"71c660f6-69d7-4199-b641-c5bd87d07ba3", APIVersion:"apps/v1", ResourceVersion:"6750", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-proxy-r69ng I0819 07:47:37.629926 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"kube-scheduler", UID:"4f9aea19-4b95-46e1-903d-25db83c3d17d", APIVersion:"apps/v1", ResourceVersion:"6736", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set kube-scheduler-6d9578bd6 to 0 I0819 07:47:37.647394 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"kube-scheduler-6d9578bd6", UID:"71743313-9430-40da-bb6e-9cff2dc7472e", APIVersion:"apps/v1", ResourceVersion:"6823", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: kube-scheduler-6d9578bd6-mgrct I0819 07:47:52.603633 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"calico-node", UID:"a4212caf-5dfc-4e70-acae-b1fdab8ac39c", APIVersion:"apps/v1", ResourceVersion:"6903", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: calico-node-pmg7j I0819 07:47:52.652965 1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"calico-node", UID:"a4212caf-5dfc-4e70-acae-b1fdab8ac39c", APIVersion:"apps/v1", ResourceVersion:"6910", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: calico-node-bhhsq I0819 07:47:53.327205 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"kube-system", Name:"calico-kube-controllers", UID:"cca53957-f09f-4c5a-afa5-1b247f35d640", APIVersion:"apps/v1", ResourceVersion:"6917", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set calico-kube-controllers-68c54c6b44 to 0 I0819 07:47:53.341477 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"calico-kube-controllers-68c54c6b44", UID:"c141f89e-a964-416e-94d2-4bc6ab8bfa6e", APIVersion:"apps/v1", ResourceVersion:"6918", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: calico-kube-controllers-68c54c6b44-dbd76 E0819 07:47:53.341836 1 driver-call.go:266] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input W0819 07:47:53.341879 1 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: fork/exec /var/lib/kubelet/volumeplugins/nodeagent~uds/uds: no such file or directory, output: "" E0819 07:47:53.341917 1 plugins.go:731] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input E0819 07:47:54.894035 1 driver-call.go:266] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input W0819 07:47:54.894111 1 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: fork/exec /var/lib/kubelet/volumeplugins/nodeagent~uds/uds: no such file or directory, output: "" E0819 07:47:54.894145 1 plugins.go:731] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input [REDACTED] ```

rata commented 4 years ago

@surajssd not sure I follow. There was some configuration that was working with one version, changed lokoctl to a newer version run lokoctl cluster apply and that directory is missing? Or how would the steps be to reproduce it? If you can paste the full steps to reproduce, it would be great :)

Do you have the ignition config that installed the /var/lib/kubelet/volumeplugins/nodeagent~uds/uds? Do you know what is that and how it is used in that setup?

surajssd commented 4 years ago

lokocfg file:

cluster "packet" {
  auth_token     = var.packet_token

  project_id   = var.packet_project_id
  cluster_name = var.cluster_name
  facility     = var.facility

  controller_type = "t1.small.x86"

  asset_dir        = "./assets"
  controller_count = 3
  ssh_pubkeys = [
  ]

  management_cidrs  = ["0.0.0.0/0"]
  node_private_cidr = "10.0.0.0/8"

  disable_self_hosted_kubelet = true

  dns {
    zone = var.route53_zone
    provider = "route53"
  }

  worker_pool "foobar" {
    count = 1
    node_type = "c2.medium.x86"
  }
}
invidian commented 4 years ago

@rata those plugins are usually installed by a DaemonSet from storage solution. I've never experienced kube-controller-manager to actually execute them though. Usually only kubelet does that...

rata commented 4 years ago

@surajssd that is it to reproduce? No component for flex volumes nor anything? Cool, seems simpler to debug. Still very surprising, but let's see :)

surajssd commented 4 years ago

@surajssd that is it to reproduce? No component for flex volumes nor anything? Cool, seems simpler to debug. Still very surprising, but let's see :)

That's all is needed, hence I was surprised as well, at such a finding.

iaguis commented 4 years ago

Might help: https://github.com/projectcalico/pod2daemon/issues/20 and https://github.com/projectcalico/calico/issues/2771

surajssd commented 3 years ago

Right now the problem is that this plugin is allowed to be executed only by user and group and has no permissions for other users:

# ls -al
total 5272
drwxr-xr-x. 2 root root    4096 Mar 26 09:21 .
drwxr-xr-x. 3 root root    4096 Mar 26 08:14 ..
-r-xr-x---. 1 root root 5374554 Mar 26 09:21 uds

But we run controller-manager process with UID 65534. Hence there are endless stream of logs.


~Resolution~ Workaround

Run this command on all the controller nodes:

sudo chmod +x /var/lib/kubelet/volumeplugins/nodeagent~uds/uds

Then restart (kill one and it will come up again) the kube-controller-manager pods one at a time.

invidian commented 3 years ago

Resolution

This sounds a lot like a workaround, not as a resolution to me :smile:

surajssd commented 3 years ago

Edited :laughing: