Closed stubbi closed 1 year ago
I've been using this project to learn tf and k8s, though not deploying kube-hetzer itself.
Starting last night, I could not get any nodes to join the cluster because they could not ping the load balancer, even though the subnet and IPS are set up very similar to kube-hetzer. Was working fine until around yesterday evening.
I was able to get a successful provisioning about 30 minutes ago, but then it failed again afterwards.
Update: I have been triple-checking my config and everything seems ok. Also, everything is connected properly according to the hcloud dashboard. Also manually attached/detached a load balancer from the private network doesn't behave any differently. For the time being, shelling into each server node shows that they can ping each other just fine, but the load balancer is unreachable on its private ip, so for whatever reason, the load balancer simply isn't connecting to the private network properly.
Another update: starting from scratch in an empty project using the hcloud console. Not sure how useful this info is and I may be missing something conceptually.
Thanks for confirming I am not the only one affected, @thebearingedge!
Together with the issue https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/570 raised by @RudlTier this seems to (have) be(en) an issue on Hetzner's side.
@stubbi @thebearingedge It seems Hetzner is indeed having issues. You can monitor the status here https://status.hetzner.com/.
@mysticaltech right. I had checked the status page yesterday as well but couldn't find anything that was giving a hint towards this behaviour.
The issue still persists for me. I'll reach out to Hetzner support and will let you know if there are any insights from this for this project
@stubbi Thank you. But in your case, you might want to terraform destroy
(see readme) and then comment cni_plugin="cilium", and try again.
Normally the default cilium config works, but just in case, try like this.
If that does not work also, in order to prove the Hetzner hypothesis, change locations.
I was able to get a load balancer on a private network this morning 🎉 . By the way, the network was in the Hilsboro, OR network zone. Everything seems operational at this point, with some minor quirks in vm provisioning speed.
I have to wonder what the root cause of the issue was. A couple of weeks ago I was attempting to replicate some of the MicroOS work done here in kube-hetzner, but only 66% of my virtual machines would receive an eth1
interface. This sort of reminds me of that.
Anyway, cheers, y'all!
Good to hear @thebearingedge! Yes, we've seen that from time to time, the Hetzner infra has issues. And the load balancer gets deployed on request of the CCM, so if the Hetzner APIs are not working properly, it will not work.
Between, if you add any cool features, PRs always welcome! 🙏
@mysticaltech Yessir, and thank you for the awesome project. If I come up with anything that could be useful I will be sure to contribute it. For now I'm in mad scientist mode 🤣
Folks, I am considering this fixed.
Hi there,
Thank you very much for this project! ❤️
I had created one K8s cluster today before just fine. After deleting the project and creating a new one, it seems not to work anymore... I haven't reached any of my quotas yet. I have tried several times now, always ending in
(and I am sure it will stay this way until hitting timeout). I have also noticed that the control plane loadbalancer shows that all 3 targets would be unhealthy which surprises me. I think they are fine.
Here is my
.tf
, which - compared to the default one -nginx
lb.mydomain.com
cert_manager
cilium
as cni pluginand my terraform version: