Closed ErwinSteffens closed 3 years ago
So it is trying to read the ProviderID
from the spec here: https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/99d4622bde406f3d8c612e1c011009210b907789/internal/hcops/load_balancer.go#L512
But looking at the node I do not see this field. The command kubectl describe node worker-1 | grep ProviderID
gives nothing. I will look into why this goes wrong.
Hey @ErwinSteffens,
please make sure that the cluster was initialized with the flag (Master nodes!) and also on the worker nodes. When the flag is correctly added you should see a ProviderID
(as you already mentioned) and a taint node.cloudprovider.kubernetes.io/uninitialized
. You can find more details under: https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager
Hey @LKaemmerling, thnx for anwsering.
I think I have added this flag correctly. I have just rebuild my cluster with --cloud-provider=external
.
This is on one of the worker node:
root@worker-1:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2021-04-12 21:01:03 CEST; 12h ago
Docs: https://kubernetes.io/docs/home/
Main PID: 1406 (kubelet)
Tasks: 18 (limit: 9284)
Memory: 64.2M
CGroup: /system.slice/kubelet.service
└─1406 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.2 --node-ip=10.0.0.2 --cloud-provider=external
I have now removed the worker node from the cluster and removed the cloud controller manager deployment. The worker node is now added and has the node.cloudprovider.kubernetes.io/uninitialized=true
taint:
~ k describe node eu-prod-k8s-worker-1
Name: eu-prod-k8s-worker-1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=eu-prod-k8s-worker-1
kubernetes.io/os=linux
Annotations: alpha.kubernetes.io/provided-node-ip: 10.1.0.1
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 10.1.0.1/32
projectcalico.org/IPv4VXLANTunnelAddr: 10.244.22.64
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 13 Apr 2021 09:16:25 +0200
Taints: **node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule**
I have now re-deployed the cloud controller manager. The taint is removed now and now I see the ProviderID is added now somehow.
Is there something I did wrong in the order of deployment maybe?
Seeing another error now in the logs, but not sure if it is a real problem:
E0413 07:30:55.026955 1 route_controller.go:118] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/hcloudRouteToRoute: hcops/AllServersCache.ByPrivateIP: hcops/AllServersCache.getCache: not found
See comment at issue: https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/165
@LKaemmerling
I have just re-build my cluster and seeing the issue again. Here is the order in which things are created:
--cloud-provider=external
)--cloud-provider=external
)I now see that the node.cloudprovider.kubernetes.io/uninitialized=true
taint is removed from all nodes.
The ProviderID
field is only added to the master node.
Here are the logs from the CCM:
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0413 11:47:25.441873 1 serving.go:313] Generated self-signed cert in-memory
W0413 11:47:38.335049 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0413 11:47:38.433685 1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
Hetzner Cloud k8s cloud controller v1.8.1 started
W0413 11:47:39.557952 1 controllermanager.go:132] detected a cluster without a ClusterID. A ClusterID will be required in the future. Please tag your cluster to avoid any future issues
I0413 11:47:39.561298 1 secure_serving.go:178] Serving securely on [::]:10258
I0413 11:47:39.562556 1 controllermanager.go:247] Started "service"
I0413 11:47:39.562700 1 controller.go:208] Starting service controller
I0413 11:47:39.562721 1 shared_informer.go:223] Waiting for caches to sync for service
I0413 11:47:39.562814 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 11:47:39.562834 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 11:47:39.562868 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0413 11:47:39.562992 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 11:47:39.563002 1 shared_informer.go:223] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 11:47:39.662987 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0413 11:47:39.663535 1 shared_informer.go:230] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0413 11:47:39.667485 1 controllermanager.go:247] Started "route"
I0413 11:47:39.670824 1 route_controller.go:100] Starting route controller
I0413 11:47:39.670841 1 shared_informer.go:223] Waiting for caches to sync for route
I0413 11:47:39.732968 1 node_controller.go:110] Sending events to api server.
I0413 11:47:39.733114 1 controllermanager.go:247] Started "cloud-node"
I0413 11:47:39.734683 1 node_lifecycle_controller.go:78] Sending events to api server
I0413 11:47:39.734734 1 controllermanager.go:247] Started "cloud-node-lifecycle"
I0413 11:47:39.833607 1 node_controller.go:325] Initializing node master-1 with cloud provider
I0413 11:47:39.862951 1 shared_informer.go:230] Caches are synced for service
I0413 11:47:39.870997 1 shared_informer.go:230] Caches are synced for route
I0413 11:47:40.621475 1 route_controller.go:193] Creating route for node master-1 10.244.0.0/24 with hint 23dbc99b-2ae4-40e2-b0e1-79ff319360d8, throttled 461ns
I0413 11:47:41.817005 1 route_controller.go:213] Created route for node master-1 10.244.0.0/24 with hint 23dbc99b-2ae4-40e2-b0e1-79ff319360d8 after 1.19551538s
I0413 11:47:42.930994 1 node_controller.go:397] Successfully initialized node master-1 with cloud provider
E0413 11:49:14.737789 1 node_lifecycle_controller.go:155] error checking if node worker-1 is shutdown: hcloud/instances.InstanceShutdownByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://:
E0413 11:49:14.888616 1 node_lifecycle_controller.go:172] error checking if node worker-1 exists: hcloud/instances.InstanceExistsByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: 11273100
E0413 11:49:19.889034 1 node_lifecycle_controller.go:155] error checking if node worker-1 is shutdown: hcloud/instances.InstanceShutdownByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://:
I0413 11:49:19.990213 1 route_controller.go:193] Creating route for node worker-1 10.244.1.0/24 with hint 3d99d398-704b-4f7e-b293-80b42b307ee8, throttled 662ns
E0413 11:49:20.100512 1 node_lifecycle_controller.go:172] error checking if node worker-1 exists: hcloud/instances.InstanceExistsByProviderID: hcloud/providerIDToServerID: missing prefix hcloud://: 11273100
I0413 11:49:21.307966 1 route_controller.go:213] Created route for node worker-1 10.244.1.0/24 with hint 3d99d398-704b-4f7e-b293-80b42b307ee8 after 1.317754875s
I0413 11:49:21.308162 1 route_controller.go:303] Patching node status worker-1 with true previous condition was:nil
Hmm, I think I had misconfigured the worker nodes.
I am very sorry for wasting your time on this issue... :disappointed:
I have read #80 and think I have initialised the kubelet correctly, but I still get this error.
Here is the full error:
Looking at the kubelet systemd command on the master node I see
--cloud-provider=external
:Can you maybe guide me a little bit on what to check? Do I have to add the
--cloud-provider=external
also on the worker nodes?