Closed abishekas closed 1 month ago
Hey @abishekas,
first off, there was an incident in our API yesterday around 17:00-17:40 CEST which might have been the cause for this. Can you try again today?
If it still does not work:
kubectl get node <your-broken-node> -o yaml
Hey @apricote ,
Myself and @abishekas are part of the same team. Here are the logs you have requested. hccm-logs.txt
Screenshot:
P.S:
Below are the errors when I try to install kubernetes package on the new server inside the old projects I had yesterday where I faced actual issue:
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg curl: (22) The requested URL returned error: 403 gpg: no valid OpenPGP data found.
So I have created new project and tried this by adding new server. Because on the older machine where I tried yesterday I was getting the error while installing kuberentes packages.
The node does not have the unitialized taint that HCCM expects. Are you sure you started the kubelet on that node with --cloud-provider=external
? HCCM will only "adopt" the node if that taint is set.
You can try to re-add the taint with kubectl taint node master node.cloudprovider.kubernetes.io/uninitialized:NoSchedule
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg curl: (22) The requested URL returned error: 403 gpg: no valid OpenPGP data found.
This sounds like your IP is blocked by pkgs.k8s.io
. This unfortunately happens from time to time and you will need to try with another IP. We recommend to mirror all assets you need for your production infrastructure to local services. You can not rely on pkgs.k8s.io being available at all times. See this thread for previous discussions of this topic: https://github.com/kubernetes/registry.k8s.io/issues/138
Hi @apricote ,
Thanks for your valuable response , we will always up all nodes with --cloud-provider=external flag
in the kubelet configuration, and also the taint is already there in my master machines and am attaching this screenshot for your reference.
Today around 13:30 UTC+0 we saw a maintenance work on the cloud API and cloud console in hetzner side and after that our cluster’s are able to make connection with the hetzner cloud. I am attaching that maintenance window screenshot for your reference.
It is resolved right after the maintenance window. Not sure if anything is changed at your end. We want this to be future proof. As a precaution do you have any suggestions to solve this for future if such issue happen again?
Good to hear that everything works now.
I am not really sure what the issue was, so I do not have any suggestions on what you can improve for the future.
If you ever encounter issues again, you can try to run HCCM with env variable HCLOUD_DEBUG=true
and the flag -v=5
to get way more logs.
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.
TL;DR
Hi Everyone,
We were planning to move our production environment to the Hetzner cloud. So we provisioned a Kubernetes cluster (self-managed) setup in Hetzner servers for our project and for making connection establishment between Hetzner and our Kubernetes cluster we used the Hetzner cloud controller manager. By following the below document, we provisioned it.
https://community.hetzner.com/tutorials/install-kubernetes-cluster#:~:text=Now%20deploy%20the%20Hetzner%20Cloud%20controller%20manager%20into%20the%20cluster
Expected behavior
We deployed this during March 2024 and everything was working as expected till yesterday, but today when we create a new server in the Hcloud console and add it to the same cluster, the hcloud providerid and the region topology labels are not added for that server and we are utilizing the nginx ingress as Loadbalancer for this setup. when we apply the ingress-nginx it will automatically connect with the load balancer in the cloud but from today that connection is also not working.
Observed behavior
We tried to resolve this with logs from the Hetzner cloud controller manager but we couldn't see any errors in the logs. I'm sharing the log data below for reference. We also tried provisioning a new setup to see if that works, but we received the same issue. We verified the network connectivity to Hetzner Cloud from our server through API calls, and through PING requests, it works fine. We even created a new setup with another region, but the issue still persists.
We have planned our production migration for this weekend, so any quick help would be greatly appreciated. Thanks.
Minimal working example
No response
Log output
Additional information
No response