Closed mateuszlewko closed 8 months ago
Now when running again I see the following logs for systemctl status k3s-agent
:
el=info msg="Waiting to retrieve agent configuration; server is not ready: CA cert validation failed: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
el=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 503 Service Unavailable"
el=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 503 Service Unavailable"
el=info msg="Waiting to retrieve agent configuration; server is not ready: https://127.0.0.1:6444/v1-k3s/serving-kubelet.crt: 503 Service Unavailable"
This terraform thing is too flakey i'm afraid. As far as I'm concerned, it has worked for a few days. Now I cannot create nodeools anymore. It's stuck in creating state even though the servers are up and running in the Hetzner console. On the servers I get: "Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
Might be connected to the (unresolved) discussion I started recently #1287
I'm having weird behaviour after updating the nodes with a recent microos update. Weird network connectivity issues, that I couldn't figure out yet (I just rolled back and disabled updates for now).
Edit: I also saw some "503 Service Unavailable" and "connection refused" in my logs. I know those are very generic errors, but still.
@kube-hetzner/core Any ideas?
@mateuszlewko Try with cni_ plugin="cilium"
, I would guess it works better with wireguard.
Hi, I got the same error timed out waiting for the condition on deployments/system-upgrade-controller
with a very similiar configuration and cilium enabled.
Hmm, I might have had the same too, I had nodes unable to come back to life after a reboot after a k3s upgrade. Replaced the nodes (long live longhorn) and turned off upgrades. Have not verified that this was the real problem though, but haven't seen it happen again either. No need to roll back anything though: the fresh nodes are on the latest k3s and have microos updating weekly without issues. Just not automatically upgrading k3s.
Considering this as a occasional hiccup, but will monitor the situation.
@kimdre Could you share your kube.tf please.
@mateuszlewko Did you manage to make it work? What about you @andi0b ?
I disabled wireguard and recreated the cluster some time later. I haven't checked if wireguard works better with cillium or if that was the actual problem.
@mysticaltech
Did you manage to make it work? What about you @andi0b ?
No, I'm currently on easter holiday and didn't investigate it more. I just disabled kured (I think with something like kubectl -n kube-system annotate ds kured weave.works/kured-node-lock='{"nodeID":"manual"}'
) and rolled back the nodes to the last working snapshot (i think with transactional-update rollback [number]
).
Folks, this was probably due to a bug in system upgrade controller, now fixed. Make sure to upgrade with terraform init -upgrade
. If such an issue comes again, please don't hesitate to open another one with your kube.tf. Closing this one for now.
Description
I'm launching a new cluster on the latest version of this package. I'm using ed25519 ssh keys without a password. Creation of completely new cluster seems to be stuck (for > 40 min) on configuration of a single agent node (the server is present in hetzner UI).
This is the last excerpt from logs:
Kube.tf file
Screenshots
No response
Platform
Mac