hetznercloud / hcloud-cloud-controller-manager

Kubernetes cloud-controller-manager for Hetzner Cloud
Apache License 2.0
732 stars 118 forks source link

Error: missing prefix hcloud://: #179

Closed ErwinSteffens closed 3 years ago

ErwinSteffens commented 3 years ago

I have read #80 and think I have initialised the kubelet correctly, but I still get this error.

Here is the full error:

'Warning' reason: 'SyncLoadBalancerFailed' Error syncing load balancer: failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.ReconcileHCLBTargets: hcops/providerIDToServerID: missing prefix hcloud://:

Looking at the kubelet systemd command on the master node I see --cloud-provider=external:

root@master-1:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Mon 2021-04-12 20:22:39 CEST; 21min ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 4531 (kubelet)
      Tasks: 16 (limit: 2286)
     Memory: 103.7M
     CGroup: /system.slice/kubelet.service
             └─4531 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.2 --cloud-provider=external

Can you maybe guide me a little bit on what to check? Do I have to add the --cloud-provider=external also on the worker nodes?

ErwinSteffens commented 3 years ago

So it is trying to read the ProviderID from the spec here: https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/99d4622bde406f3d8c612e1c011009210b907789/internal/hcops/load_balancer.go#L512

But looking at the node I do not see this field. The command kubectl describe node worker-1 | grep ProviderID gives nothing. I will look into why this goes wrong.

LKaemmerling commented 3 years ago

Hey @ErwinSteffens,

please make sure that the cluster was initialized with the flag (Master nodes!) and also on the worker nodes. When the flag is correctly added you should see a ProviderID (as you already mentioned) and a taint node.cloudprovider.kubernetes.io/uninitialized. You can find more details under: https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager

ErwinSteffens commented 3 years ago

Hey @LKaemmerling, thnx for anwsering.

I think I have added this flag correctly. I have just rebuild my cluster with --cloud-provider=external.

This is on one of the worker node:

root@worker-1:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Mon 2021-04-12 21:01:03 CEST; 12h ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 1406 (kubelet)
      Tasks: 18 (limit: 9284)
     Memory: 64.2M
     CGroup: /system.slice/kubelet.service
             └─1406 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.2 --node-ip=10.0.0.2 --cloud-provider=external

I have now removed the worker node from the cluster and removed the cloud controller manager deployment. The worker node is now added and has the node.cloudprovider.kubernetes.io/uninitialized=true taint:

~ k describe node eu-prod-k8s-worker-1
Name:               eu-prod-k8s-worker-1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=eu-prod-k8s-worker-1
                    kubernetes.io/os=linux
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.1.0.1
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.1.0.1/32
                    projectcalico.org/IPv4VXLANTunnelAddr: 10.244.22.64
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 13 Apr 2021 09:16:25 +0200
Taints:             **node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule**

I have now re-deployed the cloud controller manager. The taint is removed now and now I see the ProviderID is added now somehow.

Is there something I did wrong in the order of deployment maybe?

ErwinSteffens commented 3 years ago

Seeing another error now in the logs, but not sure if it is a real problem:

E0413 07:30:55.026955       1 route_controller.go:118] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/hcloudRouteToRoute: hcops/AllServersCache.ByPrivateIP: hcops/AllServersCache.getCache: not found

See comment at issue: https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/165

ErwinSteffens commented 3 years ago

@LKaemmerling

I have just re-build my cluster and seeing the issue again. Here is the order in which things are created:

I now see that the node.cloudprovider.kubernetes.io/uninitialized=true taint is removed from all nodes.

The ProviderID field is only added to the master node.

Here are the logs from the CCM:

Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0413 11:47:25.441873       1 serving.go:313] Generated self-signed cert in-memory
W0413 11:47:38.335049       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0413 11:47:38.433685       1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
Hetzner Cloud k8s cloud controller v1.8.1 started
W0413 11:47:39.557952       1 controllermanager.go:132] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0413 11:47:39.561298       1 secure_serving.go:178] Serving securely on [::]:10258
I0413 11:47:39.562556       1 controllermanager.go:247] Started "service"
I0413 11:47:39.562700       1 controller.go:208] Starting service controller
I0413 11:47:39.562721       1 shared_informer.go:223] Waiting for caches to sync for service
I0413 11:47:39.562814       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 11:47:39.562834       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 11:47:39.562868       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0413 11:47:39.562992       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 11:47:39.563002       1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 11:47:39.662987       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 11:47:39.663535       1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 11:47:39.667485       1 controllermanager.go:247] Started "route"
I0413 11:47:39.670824       1 route_controller.go:100] Starting route controller
I0413 11:47:39.670841       1 shared_informer.go:223] Waiting for caches to sync for route
I0413 11:47:39.732968       1 node_controller.go:110] Sending events to api server.
I0413 11:47:39.733114       1 controllermanager.go:247] Started "cloud-node"
I0413 11:47:39.734683       1 node_lifecycle_controller.go:78] Sending events to api server
I0413 11:47:39.734734       1 controllermanager.go:247] Started "cloud-node-lifecycle"
I0413 11:47:39.833607       1 node_controller.go:325] Initializing node master-1 with cloud provider
I0413 11:47:39.862951       1 shared_informer.go:230] Caches are synced for service
I0413 11:47:39.870997       1 shared_informer.go:230] Caches are synced for route
I0413 11:47:40.621475       1 route_controller.go:193] Creating route for node master-1 10.244.0.0/24 with hint 23dbc99b-2ae4-40e2-b0e1-79ff319360d8, throttled 461ns
I0413 11:47:41.817005       1 route_controller.go:213] Created route for node master-1 10.244.0.0/24 with hint 23dbc99b-2ae4-40e2-b0e1-79ff319360d8 after 1.19551538s
I0413 11:47:42.930994       1 node_controller.go:397] Successfully initialized node master-1 with cloud provider
E0413 11:49:14.737789       1 node_lifecycle_controller.go:155] error checking if node worker-1 is shutdown: hcloud/instances.InstanceShutdownByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://:
E0413 11:49:14.888616       1 node_lifecycle_controller.go:172] error checking if node worker-1 exists: hcloud/instances.InstanceExistsByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: 11273100
E0413 11:49:19.889034       1 node_lifecycle_controller.go:155] error checking if node worker-1 is shutdown: hcloud/instances.InstanceShutdownByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://:
I0413 11:49:19.990213       1 route_controller.go:193] Creating route for node worker-1 10.244.1.0/24 with hint 3d99d398-704b-4f7e-b293-80b42b307ee8, throttled 662ns
E0413 11:49:20.100512       1 node_lifecycle_controller.go:172] error checking if node worker-1 exists: hcloud/instances.InstanceExistsByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: 11273100
I0413 11:49:21.307966       1 route_controller.go:213] Created route for node worker-1 10.244.1.0/24 with hint 3d99d398-704b-4f7e-b293-80b42b307ee8 after 1.317754875s
I0413 11:49:21.308162       1 route_controller.go:303] Patching node status worker-1 with true previous condition was:nil
ErwinSteffens commented 3 years ago

Hmm, I think I had misconfigured the worker nodes.

I am very sorry for wasting your time on this issue... :disappointed: