Rasberry Pi loses IPv6 connectivity after DHCP lease elapses

TobiasChen commented 1 week ago

Bug Report

Description

A Raspi lets IPv6 connectivy laps, after the valid_lft time expires:

After a rebooting the node, a new IPv6 connection is established, talos shows both ipv4 and ipv6 ips under talosctl get nodeip and ip addr using a pod on the hostnetwork outputs the following: ` 4: enxe45f01a83c47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e4:5f:01:a8:3c:47 brd ff:ff:ff:ff:ff:ff

inet 10.0.0.3/22 brd 10.0.3.255 scope global enxe45f01a83c47
   valid_lft forever preferred_lft forever
inet6 fd20:2:2:2:e65f:1ff:fea8:3c47/64 scope global dynamic mngtmpaddr
   valid_lft 4009sec preferred_lft 409sec
inet6 2a02:8071:8281:fa0:e65f:1ff:fea8:3c47/64 scope global dynamic mngtmpaddr
   valid_lft 4009sec preferred_lft 409sec
inet6 fe80::e65f:1ff:fea8:3c47/64 scope link
   valid_lft forever preferred_lft forever

`

After the valid_lft time expires, kubelet is restartet and ip addr displays the following:

` 4: enxe45f01a83c47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e4:5f:01:a8:3c:47 brd ff:ff:ff:ff:ff:ff

inet 10.0.0.3/22 brd 10.0.3.255 scope global enxe45f01a83c47
   valid_lft forever preferred_lft forever
inet6 fe80::e65f:1ff:fea8:3c47/64 scope link
   valid_lft forever preferred_lft forever

` Afterwards, the node is not reachable via ipv6, and pings work neither from nor to the node.

Other devices on the same network do not have this issue with IPv6 connectivity and I am a bit lost how to further debug this issue. I found some posts suggesting issues with crashing DHCP services on Raspis if a large ammount of interfaces were present, but I would expect that to come up in the logs somwhere.

Logs

hestia.log

Environment

Talos version: 1.8.2
Platform: RB4b
Image: factory.talos.dev/installer/acd41da8ebb38e89794309e662d151b1bb5f3ddce6153b54cd00fd5b46c99582:v1.8.2

TobiasChen commented 1 week ago

Network Config:

network:
    hostname: HESTIA
    interfaces:
            - deviceSelector:
                busPath: "fd580000.ethernet"
              dhcp: true
              # addresses:
              #   - fd20:2:2:2:e65f:1ff:fea8:3c48
              dhcpOptions:
                ipv6: true

support.zip: support.zip

smira commented 1 week ago

In the log you provided the address is still assigned, so it's not possible to understand why it gets lost.

There seems to be an issue with Talos try to assign the address many times, not sure what removes it in your setup (?).

TobiasChen commented 1 week ago

Ah Im verry sorry, I'll create a new support.zip once the node has timed out again. I honestly have no idea what would lead Talos to assign the address over and over again.

My setup is a home router(Fritzbox 6660), and a shared network for all the devices. All the devices get a unique local address(The fd20:2:2:2::/64) and the fritz box is assigning IPv6 addresses in the home network (2a02:8071:8281:fa0::/64) via DHCP6. I also tried to only assign the DNS server via DHCPv6 and let devices choose their own IPv6 addresses, however, the effect was the same(Ipv6 address getting assigned once, and then not getting renewed after the lease elapses).

The only peculiar thing about my setup is a pihole DNS server, which I run inside the cluster and announce from the fritzbox via DHCP and DHCPv6. This leads to DNS resolution not working until the cluster is available after a restart, and might also be responsible for some of the error serving DNS request messages early on in the log. I dont think this should impact the IP address assignment, though.

TobiasChen commented 1 week ago

Here is the correct ZIP, sorry again for wasting your time support.zip

smira commented 1 week ago

Thank you, I can see the address being removed right now, even though it wasn't directly a Talos Linux action to remove it. (While it still might be a bug in Talos in the way it assigns the address).

I would like to look into IPv6 Talos support a bit more to cover all cases, but that might come only next year for Talos 1.10 release.

TobiasChen commented 1 week ago

Hmm Interesting, thank you for all the effort with this project, i'll eagerly await talos 1.10. If I can provide any more details/tests feel free to contact me anytime.

siderolabs / talos