Open ValentinVoigt opened 2 months ago
Hello, I appreciate the recommendation to check the MTU, as it should always be effective. However, when I executed the command you suggested on my servers running Ubuntu 24.04, it didn't produce any output. The alternative command ip -4 addr show | awk '/mtu 1450/ && /state UP/ {getline; print $2}'
does work, but in my situation, it returns two values. This is because I am using Cilium, which also has an MTU of 1450. Therefore, we need a more reliable method to obtain a single result for the actual private network interface.
It doesn't show an output, because I used the -q
switch (because master_install_script.sh
does that as well). Remove the -q
or execute echo $?
add the end. Is it really a problem that there will be two lines matching when using that regex? When Cilium is running, we can assume that k3s is already installed and we therefore don't need to wait for the interface to come up? Just guessing here, honestly.
I can offer some inline Python code, as I personally think this would be too complex for bash alone. But I don't know if this is a good solution. What do you think?
#!/bin/bash
SUBNET="172.16.0.0/12"
python3 - <<EOF "$SUBNET"
import netifaces, ipaddress, sys
network = ipaddress.ip_network(sys.argv[1])
for iface in netifaces.interfaces():
ips = netifaces.ifaddresses(iface)
if netifaces.AF_INET not in ips:
continue
for obj in ips[netifaces.AF_INET]:
ip = ipaddress.ip_address(obj['addr'])
if ip in network:
sys.exit(0)
sys.exit(1)
EOF
if [ $? -eq 0 ]; then
echo "ip in $SUBNET exists"
else
echo "ip in $SUBNET does not exist"
fi
I still wasn't getting any results from your command, but I found a solution that correctly identifies the interface (just one):
ip -o link show | awk -F': ' '$2 !~ /cilium|br|flannel|docker|veth/ {print $2}' | xargs -I {} bash -c "ethtool {} &>/dev/null && echo {}" | while read -r iface; do mtu=$(ip link show "$iface" | awk '/mtu/ {print $5}'); if [ "$mtu" -eq 1450 ]; then echo "$iface"; fi; done
This method is sufficient for now since we're only using flannel and cilium.
We can't depend on Python or other tools because the worker installation script runs during the Cloud Init process with just sh
when initializing autoscaled nodes (for static nodes, the script runs in regular bash), which imposes some limitations.
The command above works perfectly with the sh
shell.
Could you possibly make a PR considering this? If not, I'll handle it, but I'm currently swamped with work.
is it related to my problem?
networking:
ssh:
port: 22
use_agent: false # set to true if your key has a passphrase
public_key_path: "~/.ssh/id_rsa.pub"
private_key_path: "~/.ssh/id_rsa"
allowed_networks:
ssh:
- 0.0.0.0/0
api: # this will firewall port 6443 on the nodes; it will NOT firewall the API load balancer
- 0.0.0.0/0
public_network:
ipv4: true
ipv6: true
private_network:
enabled : true
subnet: 10.0.0.0/16
existing_network_name: "infra"
cni:
enabled: true
encryption: false
mode: flannel
Your subnet size seems to be exactly 16, so... no?
is it related to my problem?
networking: ssh: port: 22 use_agent: false # set to true if your key has a passphrase public_key_path: "~/.ssh/id_rsa.pub" private_key_path: "~/.ssh/id_rsa" allowed_networks: ssh: - 0.0.0.0/0 api: # this will firewall port 6443 on the nodes; it will NOT firewall the API load balancer - 0.0.0.0/0 public_network: ipv4: true ipv6: true private_network: enabled : true subnet: 10.0.0.0/16 existing_network_name: "infra" cni: enabled: true encryption: false mode: flannel
Are you sure 10.0.0.0/16 is the correct subnet for the network "infra"?
sorry it was my fault subnet != network )))
@ValentinVoigt Hi, I guess we can close this issue since there you opened a PR for the same problem? I haven't had a chance to test it unfortunately.
I personally only close issues, once they're fixed in a new release, but that's your decision.
It was just to do a cleanup since we also have the pr but it's ok. We can keep this open for now.
I have an existing network at Hetzner, to which I need to add a new cluster. I use the
existing_network_name
-feature for that, as per the docs:Unfortunately, since what I think was this commit, I am unable to add a new cluster. The relevant excerpt from the logs is:
After some digging, it looks like the following is happening: the
master_install_script.sh
here script takes my172.16.0.0/12
subnet, removes the/12
and removes the last.0
making it172.16.0.
. As my subnet is actually a/12
and the server's new IP address is172.18.X.Y
(which is indeed part of the original/12
), a simplegrep
will not match. This would be true for every subnet that is not a/16
.The same probably applies to
worker_install_script.sh
as well, although I haven't tried.I think there should be a hint in the docs, if this is not going to be supported anymore.
Adopting the code to just support different sizes might be very difficult. There is some awk-magic around or one could use some inline Python. But maybe there's a better way than using grep on ip.
I don't know the code's intention to reasonably argue about that, but I may have a suggestion. Hetzner writes here, that their private network MTU is always 1450, while the public interface has a MTU of 1500. Instead of looking for a matching IP address, maybe the following could be an alternative?