Closed rr4444 closed 1 year ago
EDITED ABOVE:metal lb was disabled in previous attempts, but reset back to actual repo, but same problems even with metallb
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.3.1 netmask 255.255.240.0 broadcast 172.16.15.255
inet6 fe80::299:d625:319a:d0fa prefixlen 64 scopeid 0x20<link>
ether e4:5f:01:d1:1d:11 txqueuelen 1000 (Ethernet)
RX packets 647242 bytes 112233893 (107.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 697315 bytes 374004325 (356.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
although i see that ifconfig is more of date than I realised (https://unix.stackexchange.com/a/672090), so ip addr should give the answer
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether e4:5f:01:d1:1d:11 brd ff:ff:ff:ff:ff:ff
inet 172.16.3.1/20 brd 172.16.15.255 scope global dynamic noprefixroute eth0
valid_lft 81649sec preferred_lft 70849sec
inet 172.16.30.222/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::299:d625:319a:d0fa/64 scope link
valid_lft forever preferred_lft forever
which looks ok i think............???
Yet 172.16.30.222 is not responding outside of that master node
Just to triple check the ARP behaviour from a worker node:
$ sudo arp-scan 172.16.30.222
Interface: eth0, type: EN10MB, MAC: e4:5f:01:d1:1c:a5, IPv4: 172.16.3.2
Starting arp-scan 1.9.7 with 1 hosts (https://github.com/royhills/arp-scan)
172.16.30.222 e4:5f:01:d1:1d:11 (Unknown)
2 packets received by filter, 0 packets dropped by kernel
Ending arp-scan 1.9.7: 1 hosts scanned in 0.490 seconds (2.04 hosts/sec). 1 responded
But then a ping to the same .222 address fails.......
Makes zero sense to me
The master node has never had anything on it, and I have rerun the reset scripts several times
Ok, just noticed that traceroute for the .222 vip is being sent out the default gateway onto the internet!
Makes no sense. My home DHCP is correctly handing out IPs with a 255.255.240.0 subnetmask, which is correct for a /20 CIDR (the whole network is on 172.16.0.0/20)
You can see that correct subnet here in the comment above: https://github.com/techno-tim/k3s-ansible/issues/298#issuecomment-1538774195
Now seems like a pure LAN networking issue, but I doubt that I am the first to see this
....yup, I only changed the first two octets of the VIP in all.yml. That placed it outside the /20 range, so get sent to the internet
Days wasted on something obvious!!!
Well, hopefully someone else benefits from this....
The virtual IP works from the master node
But neither ping nor netcat works from worker nodes when using the vip. The master.ip replies fine to ICMP and HTTP on 6443 from worker nodes.
Current Behavior
The vip is listed in ip addr for the eth0 interface, with the master ip, but the vip does not show in ifconfig (normal perhaps, or the cause?)
tcpdump on master and worker nodes both show ARP reqs and replies with the correct MAC hardware address for the master's eth0 NIC. So not some blocking of gratuitous ARP. I'd have thought it would be ok from that, but isn't....
Context
Operating system: Debian GNU/Linux 11 (bullseye) according to cat /etc/os-release (headless raspberry pi os)
Hardware: Rpi CM4s that all connect to the same switch
Variables Used
all.yml
metal lb disabled in previous attempts, but reset back to actual repo, but same problems even with metallb
k3s master tasks main.yml
Hosts
host.ini
I've checked the General Troubleshooting Guide.
I have used the reset and deployment scripts several times .
Going mad!
Many thanks for any pointers!