kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.2k stars 344 forks source link

Initial deployment of new cluster is stuck #438

Closed cmantsch closed 1 year ago

cmantsch commented 1 year ago

I wanted to deploy a new cluster, but after the following logs the process gets stuck

module.kube-hetzner.module.agents["1-0-agent-large"].hcloud_server.server: Provisioning with 'local-exec'...
module.kube-hetzner.module.agents["1-0-agent-large"].hcloud_server.server (local-exec): Executing: ["/bin/sh" "-c" "ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'IdentitiesOnly yes' -i /tmp/6pgvl4ffzrxfsr5t8trz root@x.x.x.x '(sleep 2; reboot)&'; sleep 3\r\n"]    
module.kube-hetzner.module.agents["1-0-agent-large"].hcloud_server.server (local-exec): Warning: Identity file /tmp/6pgvl4ffzrxfsr5t8trz not accessible: No such file or directory.
module.kube-hetzner.module.agents["1-0-agent-large"].hcloud_server.server (local-exec): Warning: Permanently added 'x.x.x.x' (ECDSA) to the list of known hosts.
root@x.x.x.x's password: module.kube-hetzner.module.control_planes["1-0-control-plane-nbg1"].hcloud_server.server: Still creating... [5m20s elapsed]

From what I can tell, there seems to be an issue with the private key, therefore the script falls back to password authentication and waits at the password prompt forever.

I generated the ssh keys with ssh-keygen -t ed25519 -f id_ed25519_hcloud.pub (in Ubuntu 20.04 WSL) and set it as follows in configuration

# * Your ssh public key
  ssh_public_key = file("/home/ctm/terraform/id_ed25519_hcloud.pub")
  # * Your private key must be "ssh_private_key = null" when you want to use ssh-agent for a Yubikey-like device authentification or an SSH key-pair with a passphrase.
  # For more details on SSH see https://github.com/kube-hetzner/kube-hetzner/blob/master/docs/ssh.md
  ssh_private_key = file("/home/ctm/terraform/id_ed25519_hcloud")
  # You can add additional SSH public Keys to grant other team members root access to your cluster nodes.
  # ssh_additional_public_keys = []

Already tried chmoding it to either 600 (with my user being the owner) and 777, which changed nothing.

Am I missing something here?

mysticaltech commented 1 year ago

@cmantsch Please make sure you follow the SSH doc. Then destroy, and apply again (because after a few bad tries your IP gets blacklisted in the running node itself), so destroy before re-applying is essential.

Make sure you do not block the SSH port in the additional ssh rules in your kube.tf.

Also, there are tons of closed issues so if the case does not work, search for those.

cmantsch commented 1 year ago

I followed the SSH docs, tried out with directly referring to the public/private key files and also using ssh agent and inline public key. Also always started over from scratch: deleted everything manually in hetzner console (since destroy didn't work due to inconsistent state) and deleted the terraform state.

Then I noticed, that it says Warning: Identity file /tmp/rz8b7wb2ncgvpg3n00j1 not accessible: No such file or directory. in the terraform logs, but listing the /tmp directory had the file 'rz8b7wb2ncgvpg3n00j1'$'\r' in it. Probably this was due to me having run terraform init in Windows and then copied over the project folder to the WSL file system.

Removed the terraform modules/providers and ran init again in WSL - works like a charm now.

Thanks for your reply anyway! :-)

mysticaltech commented 1 year ago

Well done!! That was great debugging @cmantsch!! 🚀🚀

KaiGrassnick commented 1 year ago

I followed the SSH docs, tried out with directly referring to the public/private key files and also using ssh agent and inline public key. Also always started over from scratch: deleted everything manually in hetzner console (since destroy didn't work due to inconsistent state) and deleted the terraform state.

Then I noticed, that it says Warning: Identity file /tmp/rz8b7wb2ncgvpg3n00j1 not accessible: No such file or directory. in the terraform logs, but listing the /tmp directory had the file 'rz8b7wb2ncgvpg3n00j1'$'\r' in it. Probably this was due to me having run terraform init in Windows and then copied over the project folder to the WSL file system.

Removed the terraform modules/providers and ran init again in WSL - works like a charm now.

Thanks for your reply anyway! :-)

Had the same issue. WSL seems to be the issue here. Never issued the command in Windows, only in WSL.

Had to use: find .terraform -type f -print0 | xargs -0 dos2unix to make it work. Issue here, files are in CRLF and terraform in WSL has issues with that. After dos2unix, files are in LF and now it works.

mysticaltech commented 1 year ago

@KaiGrassnick Super useful info, thanks for sharing!