Closed axgkl closed 3 months ago
details: On a non working autoscaled node:
root@medium-autoscaled-44fb554df4d13d7a:~# cat /etc/systemd/system/k3s-agent.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s-agent.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
agent \
'--node-name=medium-autoscaled-44fb554df4d13d7a' \
'--kubelet-arg' \
'cloud-provider=external' \
'--kubelet-arg' \
'resolv-conf=/etc/k8s-resolv.conf' \
'--node-ip=100.66.224.163' \
'--node-external-ip=100.66.224.163' \
cat /var/lib/cloud/instances/51141072/user-data.txt
(...)
- echo "Done" > /.status
- |
touch /etc/initialized
if [[ $(</etc/initialized) != "true" ]]; then
systemctl restart NetworkManager || true
dhclient eth1 -v || true
fi
HOSTNAME=$(hostname -f)
PUBLIC_IP=$(hostname -I | awk '{print $1}')
if [[ "true" = "true" ]]; then
PRIVATE_IP=$(ip route get 10.1.0.0 | awk -F"src " 'NR==1{split($2,a," ");print a[1]}')
NETWORK_INTERFACE=" --flannel-iface=$(ip route get 10.1.0.0 | awk -F"dev " 'NR==1{split($2,a," ");print a[1]}') "
else
PRIVATE_IP="${PUBLIC_IP}"
NETWORK_INTERFACE=" "
fi
mkdir -p /etc/rancher/k3s
cat >/etc/rancher/k3s/registries.yaml <<EOF
mirrors:
"*":
EOF
curl -sfL https://get.k3s.io | K3S_TOKEN="8d43d9f9268fb60cecefd04684485df0" INSTALL_K3S_VERSION="v1.30.2+k3s2" K3S_URL=https://10.1.0.4:6443 INSTALL_K3S_EXEC="agent \
--node-name=$HOSTNAME --kubelet-arg "cloud-provider=external" --kubelet-arg "resolv-conf=/etc/k8s-resolv.conf" \
--node-ip=$PRIVATE_IP \
--node-external-ip=$PUBLIC_IP \
$NETWORK_INTERFACE " sh -
echo true >/etc/initialized
When I run these commands from PUBLIC_IP=... in the shell:
# PUBLIC_IP=$(hostname -I | awk '{print $1}')
if [[ "true" = "true" ]]; then
PRIVATE_IP=$(ip route get 10.1.0.0 | awk -F"src " 'NR==1{split($2,a," ");print a[1]}')
NETWORK_INTERFACE=" --flannel-iface=$(ip route get 10.1.0.0 | awk -F"dev " 'NR==1{split($2,a," ");print a[1]}') "
else
PRIVATE_IP="${PUBLIC_IP}"
NETWORK_INTERFACE=" "
fi
root@medium-autoscaled-44fb554df4d13d7a:~# echo $PRIVATE_IP
10.1.0.7
root@medium-autoscaled-44fb554df4d13d7a:~# curl -sfL https://get.k3s.io | K3S_TOKEN="8d43d9f9268fb60cecefd04684485df0" INSTALL_K3S_VERSION="v1.30.2+k3s2" K3S_URL=https://10.1.0.4:6443 INSTALL_K3S_EXEC="agent \
--node-name=$HOSTNAME --kubelet-arg "cloud-provider=external" --kubelet-arg "resolv-conf=/etc/k8s-resolv.conf" \
--node-ip=$PRIVATE_IP \
--node-external-ip=$PUBLIC_IP \
$NETWORK_INTERFACE " sh -
[INFO] Using v1.30.2+k3s2 as release
[INFO] Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.30.2+k3s2/sha256sum-amd64.txt
[INFO] Skipping binary downloaded, installed k3s matches hash
[INFO] Skipping installation of SELinux RPM
[INFO] Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-agent-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s-agent.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s-agent.service
[INFO] systemd: Enabling k3s-agent unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s-agent.service → /etc/systemd/system/k3s-agent.service.
[INFO] systemd: Starting k3s-agent
root@medium-autoscaled-44fb554df4d13d7a:~# tail /etc/systemd/system/k3s-agent.service
agent \
'--node-name=medium-autoscaled-44fb554df4d13d7a' \
'--kubelet-arg' \
'cloud-provider=external' \
'--kubelet-arg' \
'resolv-conf=/etc/k8s-resolv.conf' \
'--node-ip=10.1.0.7' \
'--node-external-ip=100.66.224.163' \
'--flannel-iface=enp7s0' \
root@medium-autoscaled-44fb554df4d13d7a:~#
and, btw:
root@medium-autoscaled-44fb554df4d13d7a:~# cloud-init status
status: done
=> All fine.
So: Why did cloud init produce a different result, as if if [[ "true" = "true" ]];
failed (also no flannel setting, looks really like that condition failed).
.
aaaah, there we go, "[[" is a bashism, not POSIX standard:
root@medium-autoscaled-44fb554df4d13d7a:~# /bin/sh
# if [[ "true" = "true" ]]; then echo hi; else echo ho; fi
/bin/sh: 1: [[: not found
ho
Cloud-init scripts are executed in a
sh
shell. This is a POSIX-compliant shell, similar tobash
, but with fewer features.
Will compile and test with /bin/sh compliant version.
cool, that fixed it, finally, autoscaling works end to end, on priv ip clusters:
❯ kc get pods -o wide !?
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xhello-world-6bf6c965df-56whk 1/1 Running 0 2m38s 10.50.3.179 medium-autoscaled-68f6454a9986e5bd <none> <none>
xhello-world-6bf6c965df-dm2p5 1/1 Running 0 2m38s 10.50.2.241 ax-master2 <none> <none>
xhello-world-6bf6c965df-dsbkb 1/1 Running 0 2m38s 10.50.4.236 medium-autoscaled-649a0eab32ddd7f5 <none> <none>
xhello-world-6bf6c965df-v92kb 1/1 Running 0 2m38s 10.50.1.17 ax-master3 <none> <none>
xhello-world-6bf6c965df-wscf4 1/1 Running 0 2m38s 10.50.5.31 medium-autoscaled-3079dca71025eda <none> <none>
xhello-world-6bf6c965df-zf599 1/1 Running 0 2m38s 10.50.0.173 ax-master1 <none> <none>
Note: It did work on non priv ip clusters, due to the else branch of the if condition executed, i.e. the one for pub ips.
if [[ "true" = "true" ]]; then
PRIVATE_IP=$(ip route get 10.1.0.0 | awk -F"src " 'NR==1{split($2,a," ");print a[1]}')
NETWORK_INTERFACE=" --flannel-iface=$(ip route get 10.1.0.0 | awk -F"dev " 'NR==1{split($2,a," ");print a[1]}') "
else
PRIVATE_IP="${PUBLIC_IP}"
NETWORK_INTERFACE=" "
fi
With this in /bin/sh:
# if [[ "true" = "true" ]]; then echo hi; else echo ho; fi
/bin/sh: 1: [[: not found
ho
Do you still have problems with this?
Closing since I just released v2.0.0 and this stuff seems to work fine from my testing today. Please open another issue if still needed with the new version. Thanks for the PRs!
Hi, I'm testing autoscaling on priv ip clusters.
After this https://github.com/vitobotta/hetzner-k3s/pull/394#issuecomment-2261144466 the autoscaler creates the nodes and they join the cluster. All kube-system pods for the new ones running.
Only left problem: Scheduling 6 pods with an affinity rule that they should be on 6 differnent hosts, only the 3 on my 3 masters get into running - the others remain pending.
ccm log says:
And those nodes kept having the unschedulable taint.
Then I went on an autoscaled node via ssh and reconfigured the node-ip in /etc/systemd/system/k3s-agent.service away from the 100.66.1.123 to the ip within the private network (the 10.1.0.10 below) - and restarted the service.
=> Instantly working pod, i.e. that was the culprint.
will dig deeper regarding why you take the eth0 ip as
--node-ip
and not the private network one.