vitobotta / hetzner-k3s

The easiest and fastest way to create and manage Kubernetes clusters in Hetzner Cloud using the lightweight distribution k3s by Rancher.
MIT License
1.78k stars 132 forks source link

k3s cluster and nat gateway #454

Open mertcangokgoz opened 3 hours ago

mertcangokgoz commented 3 hours ago

I am currently using nat gateway in my project, I need k3s and I want to communicate my cluster only with private ip without any public ip address. I am using debian-12 image in the cluster.

As a result of this configuration, I expect the machines to go to the internet and at the same time the pods to stand up. However, during the installation, it makes an output like the following, I think the installation is not completed in a healthy way.

image
vitobotta commented 2 hours ago

Hi, do you see the server(s) attached to the main-vpc-network network in the Hetzner Console? If yes do they get an IP in that network?

vitobotta commented 2 hours ago

Please SSH into one of the servers attached to the network and run

SUBNET="10.13.0.0/16"
SUBNET_PREFIX=$(echo $SUBNET | cut -d'/' -f1 | sed 's/\./\\./g' | sed 's/0$//')

echo $SUBNET_PREFIX 

Does it return the correct prefix?

Then run

ip -4 addr show | grep -q "inet $SUBNET_PREFIX" 

What does it return?

vitobotta commented 2 hours ago

My gut feeling is that there is something wrong with your post_create_commands.

Attach a temp server to the same network, then SSH into it and with /bin/sh, not bash (since Cloud Init script must work in regular sh shell) try running your post create commands and see if all of them work just fine.

vitobotta commented 2 hours ago

What do you get with ip -4 addr show?

vitobotta commented 1 hour ago

Can you try ip -4 addr show | grep "inet $SUBNET_PREFIX" without -q? Trying to replicate what happens during the installation.

mertcangokgoz commented 1 hour ago

@vitobotta

I changed the subnet and the problem disappeared(I don't know if it has something to do with the subnets I've split.), of course I haven't included post_create_commands yet, but I get a situation like the following, is this coming from ssh?

[Instance blackhole-k3s-cluster-pool-small-static-pool-worker3] Waiting for successful ssh connectivity with instance blackhole-k3s-cluster-pool-small-static-pool-worker3...
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker2] Waiting for successful ssh connectivity with instance blackhole-k3s-cluster-pool-small-static-pool-worker2...
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker1] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker1 is now up.
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker1] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker1 created
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker3] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker3 is now up.
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker3] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker3 created
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker2] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker2 is now up.
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker2] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker2 created
Unhandled exception in spawn: timeout after 00:00:30 (Tasker::Timeout)
  from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in 'raise<Tasker::Timeout>:NoReturn'
  from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in 'Tasker@Tasker::Methods::timeout<Time::Span, &Proc(Nil)>:Nil'
  from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in '~procProc(Nil)@src/cluster/create.cr:75'
  from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in 'Fiber#run:(IO::FileDescriptor | Nil)'
vitobotta commented 1 hour ago

Yeah that may be a problem with SSH, perhaps with the key. Can you try enabling the agent?

mertcangokgoz commented 1 hour ago

Yeah that may be a problem with SSH, perhaps with the key. Can you try enabling the agent?

Are you talking about the use_agent setting? But there is no password in the key I created.

vitobotta commented 1 hour ago

Another possibility may be some issue with Debian due to some recent changes made to address the new way of handling custom ssh ports in newer versions of Ubuntu. Can you try with Ubuntu but with the same configuration to see if that's the problem?

mertcangokgoz commented 58 minutes ago
image

Thank you very much for your help, I have one last question. all of the machines have internet access, my configurations are correct, but the following warning comes, is this normal for autoscale

image

apart from this I get the following warning yes it is set so that pods cannot be opened on the master node but hetzner did not create pods on other nodes for csi-controller etc. but opened 3 machines

Is this a normal process

vitobotta commented 47 minutes ago

It's not a warning :) It's just telling you that some ponds were probably pending due to lack of resources so the cluster had to scale up. Did it add a new node?

mertcangokgoz commented 45 minutes ago
image

yes it added 3 nodes, I can not see the added ones with the kubectl get nodes command, it seems that I have 3 master 3 workers now.

vitobotta commented 24 minutes ago

What do you see in the autoscaler's logs?