Open mertcangokgoz opened 2 months ago
Hi, do you see the server(s) attached to the main-vpc-network
network in the Hetzner Console? If yes do they get an IP in that network?
Please SSH into one of the servers attached to the network and run
SUBNET="10.13.0.0/16"
SUBNET_PREFIX=$(echo $SUBNET | cut -d'/' -f1 | sed 's/\./\\./g' | sed 's/0$//')
echo $SUBNET_PREFIX
Does it return the correct prefix?
Then run
ip -4 addr show | grep -q "inet $SUBNET_PREFIX"
What does it return?
My gut feeling is that there is something wrong with your post_create_commands
.
Attach a temp server to the same network, then SSH into it and with /bin/sh, not bash (since Cloud Init script must work in regular sh shell) try running your post create commands and see if all of them work just fine.
What do you get with ip -4 addr show
?
Can you try ip -4 addr show | grep "inet $SUBNET_PREFIX"
without -q
? Trying to replicate what happens during the installation.
@vitobotta
I changed the subnet and the problem disappeared(I don't know if it has something to do with the subnets I've split.), of course I haven't included post_create_commands yet, but I get a situation like the following, is this coming from ssh?
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker3] Waiting for successful ssh connectivity with instance blackhole-k3s-cluster-pool-small-static-pool-worker3...
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker2] Waiting for successful ssh connectivity with instance blackhole-k3s-cluster-pool-small-static-pool-worker2...
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker1] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker1 is now up.
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker1] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker1 created
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker3] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker3 is now up.
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker3] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker3 created
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker2] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker2 is now up.
[Instance blackhole-k3s-cluster-pool-small-static-pool-worker2] ...instance blackhole-k3s-cluster-pool-small-static-pool-worker2 created
Unhandled exception in spawn: timeout after 00:00:30 (Tasker::Timeout)
from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in 'raise<Tasker::Timeout>:NoReturn'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in 'Tasker@Tasker::Methods::timeout<Time::Span, &Proc(Nil)>:Nil'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in '~procProc(Nil)@src/cluster/create.cr:75'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.8/bin/hetzner-k3s in 'Fiber#run:(IO::FileDescriptor | Nil)'
Yeah that may be a problem with SSH, perhaps with the key. Can you try enabling the agent?
Yeah that may be a problem with SSH, perhaps with the key. Can you try enabling the agent?
Are you talking about the use_agent
setting? But there is no password in the key I created.
Another possibility may be some issue with Debian due to some recent changes made to address the new way of handling custom ssh ports in newer versions of Ubuntu. Can you try with Ubuntu but with the same configuration to see if that's the problem?
Thank you very much for your help, I have one last question. all of the machines have internet access, my configurations are correct, but the following warning comes, is this normal for autoscale
apart from this I get the following warning yes it is set so that pods cannot be opened on the master node but hetzner did not create pods on other nodes for csi-controller etc. but opened 3 machines
Is this a normal process
It's not a warning :) It's just telling you that some ponds were probably pending due to lack of resources so the cluster had to scale up. Did it add a new node?
yes it added 3 nodes, I can not see the added ones with the kubectl get nodes
command, it seems that I have 3 master 3 workers now.
What do you see in the autoscaler's logs?
I managed to run it properly, I think I will write a small article on the subject to my blog address.
Thank you very much for your help.
I just want to ask a very small question
private_network:
enabled: true
subnet: 10.14.3.0/24
existing_network_name: ‘main-vpc-network’
Even if I configure it as such, why would it be receiving ip over 10.14.1.0/24.
Can you share the solution for posterity?
Can you also clarify the question? :p
autoscaler stopped working even though I made no changes,
1- turning on the machine I see from the hetzner cloud panel 2- I see it getting the private ip address via dhcp. 3- It seems to be starting to make installations
There's nothing after that, I'm tied up because I don't have ssh access. I can't see the logs. It's like it's not doing calm installations. clustera doesn't even include the node.
The machine has only private ip address behind NAT gateway. Routing is full there is no problem there either. I organised it according to the documentation.
How can I debug this situation?
I finally managed to solve the problem, due to the lack of public ip, the installations started to be incomplete due to both route and dns problems.
I don't know how this happened, but I solved the situation by manually intervening in the cloud-init config.
On machines with NAT gateway, the route and dns configuration needs to be run before all processes. Even if we add post_create_commands to the top, it runs at the bottom. https://github.com/vitobotta/hetzner-k3s/blob/60b862b3105d6a7362f5754ee83b5f91a2014984/templates/cloud_init.yaml#L35 I noticed that the configuration we added here was not added to the top.
I am sorry, but I am not following. Can you clarify what exactly fixed your problem and what changes you needed to make to hetzner-k3s to to solve it? I could make a new release with your fixes or you could make a PR if you are up to it. :)
In a k8s structure where there is no public network, the following should be implemented.
1-network settings should be made and nat gateway should be configured.
# Add network interface to route nat gateway
- |
cat <<'EOF' >> /etc/systemd/network/10-enp7s0.network
[Match]
Name=enp7s0
[Network]
DHCP=yes
Gateway=10.144.0.1
EOF
# reload networkd
- systemctl restart systemd-networkd
# Configure systemd-resolved
- systemctl enable systemd-resolved
- systemctl start systemd-resolved
# Set DNS
- |
cat <<'EOF' >> /etc/systemd/resolved.conf
[Resolve]
Cache=yes
DNS=185.12.64.1 185.12.64.2
FallbackDNS=1.1.1.1
EOF
- systemctl daemon-reload
- systemctl restart systemd-resolved
2- packages should not be installed with packages: command (Packages should be included in the system immediately after cloud-init network settings.)
so the cloud-init file has to be like this. If ipv4 and ipv6 are completely off
#cloud-config
preserve_hostname: true
write_files:
- path: /etc/systemd/system/ssh.socket.d/listen.conf
content: |
[Socket]
ListenStream=
ListenStream=22
- path: /etc/configure-ssh.sh
permissions: '0755'
content: |
if systemctl is-active ssh.socket > /dev/null 2>&1
then
# OpenSSH is using socket activation
systemctl disable ssh
systemctl daemon-reload
systemctl restart ssh.socket
systemctl stop ssh
else
# OpenSSH is not using socket activation
sed -i 's/^#*Port .*/Port 22/' /etc/ssh/sshd_config
fi
systemctl restart ssh
runcmd:
- hostnamectl set-hostname $(curl http://169.254.169.254/hetzner/v1/metadata/hostname)
- update-crypto-policies --set DEFAULT:SHA1 || true
- /etc/configure-ssh.sh
- |
cat <<'EOF' >> /etc/systemd/network/10-enp7s0.network
[Match]
Name=enp7s0
[Network]
DHCP=yes
Gateway=10.144.0.1
EOF
# reload networkd
- systemctl restart systemd-networkd
# Configure systemd-resolved
- systemctl enable systemd-resolved
- systemctl start systemd-resolved
# Set DNS
- |
cat <<'EOF' >> /etc/systemd/resolved.conf
[Resolve]
Cache=yes
DNS=185.12.64.1 185.12.64.2
FallbackDNS=1.1.1.1
EOF
- systemctl daemon-reload
- systemctl restart systemd-resolved
- apt update & apt-get install -y ifupdown net-tools
- echo "nameserver 8.8.8.8" > /etc/k8s-resolv.conf
- |
touch /etc/initialized
HOSTNAME=$(hostname -f)
PUBLIC_IP=$(hostname -I | awk '{print $1}')
if [ "true" = "true" ]; then
echo "Using private network " > /var/log/hetzner-k3s.log
SUBNET="10.144.1.0/24"
SUBNET_PREFIX=$(echo $SUBNET | cut -d'/' -f1 | sed 's/\./\\./g' | sed 's/0$//')
MAX_ATTEMPTS=30
DELAY=10
UP="false"
for i in $(seq 1 $MAX_ATTEMPTS); do
if ip -4 addr show | grep -q "inet $SUBNET_PREFIX"; then
echo "Private network IP in subnet $SUBNET is up" 2>&1 | tee -a /var/log/hetzner-k3s.log
UP="true"
break
fi
echo "Waiting for private network IP in subnet $SUBNET to be available... (Attempt $i/$MAX_ATTEMPTS)" 2>&1 | tee -a /var/log/hetzner-k3s.log
sleep $DELAY
done
if [ "$UP" = "false" ]; then
echo "Timeout waiting for private network IP in subnet $SUBNET" 2>&1 | tee -a /var/log/hetzner-k3s.log
fi
PRIVATE_IP=$(ip route get 10.144.1.0 | awk -F"src " 'NR==1{split($2,a," ");print a[1]}')
NETWORK_INTERFACE=" --flannel-iface=$(ip route get 10.144.1.0 | awk -F"dev " 'NR==1{split($2,a," ");print a[1]}') "
else
echo "Using public network " > /var/log/hetzner-k3s.log
PRIVATE_IP="${PUBLIC_IP}"
NETWORK_INTERFACE=" "
fi
mkdir -p /etc/rancher/k3s
cat > /etc/rancher/k3s/registries.yaml <<EOF
mirrors:
"*":
EOF
curl -sfL https://get.k3s.io | K3S_TOKEN="REDACTED" INSTALL_K3S_VERSION="v1.31.1+k3s1" K3S_URL=https://10.144.1.16:6443 INSTALL_K3S_EXEC="agent \
--node-name=$HOSTNAME --kubelet-arg "cloud-provider=external" --kubelet-arg "resolv-conf=/etc/k8s-resolv.conf" \
--node-ip=$PRIVATE_IP \
--node-external-ip=$PUBLIC_IP \
$NETWORK_INTERFACE " sh -
echo true > /etc/initialized
Unfortunately, I cannot support the project because I do not know the software language in which the project is developed :)
Thanks for clarifying! I see what you mean now. I will do some testing and see if I can release some changes that might help with this kind of setup in the next release.
I would like to confirm whether the solution functions correctly when public IP addresses are completely disabled. While the process is slow, taking around 6-7 minutes to create a small cluster, it still works as expected. I tested this without modifying the cloud-init configuration, using only the post-commands. Thanks you @mertcangokgoz
I am currently using nat gateway in my project, I need k3s and I want to communicate my cluster only with private ip without any public ip address. I am using debian-12 image in the cluster.
As a result of this configuration, I expect the machines to go to the internet and at the same time the pods to stand up. However, during the installation, it makes an output like the following, I think the installation is not completed in a healthy way.