Open TobiasChen opened 1 week ago
Network Config:
network:
hostname: HESTIA
interfaces:
- deviceSelector:
busPath: "fd580000.ethernet"
dhcp: true
# addresses:
# - fd20:2:2:2:e65f:1ff:fea8:3c48
dhcpOptions:
ipv6: true
support.zip: support.zip
In the log you provided the address is still assigned, so it's not possible to understand why it gets lost.
There seems to be an issue with Talos try to assign the address many times, not sure what removes it in your setup (?).
Ah Im verry sorry, I'll create a new support.zip once the node has timed out again. I honestly have no idea what would lead Talos to assign the address over and over again.
My setup is a home router(Fritzbox 6660), and a shared network for all the devices. All the devices get a unique local address(The fd20:2:2:2::/64) and the fritz box is assigning IPv6 addresses in the home network (2a02:8071:8281:fa0::/64) via DHCP6. I also tried to only assign the DNS server via DHCPv6 and let devices choose their own IPv6 addresses, however, the effect was the same(Ipv6 address getting assigned once, and then not getting renewed after the lease elapses).
The only peculiar thing about my setup is a pihole DNS server, which I run inside the cluster and announce from the fritzbox via DHCP and DHCPv6. This leads to DNS resolution not working until the cluster is available after a restart, and might also be responsible for some of the error serving DNS request messages early on in the log. I dont think this should impact the IP address assignment, though.
Here is the correct ZIP, sorry again for wasting your time support.zip
Thank you, I can see the address being removed right now, even though it wasn't directly a Talos Linux action to remove it. (While it still might be a bug in Talos in the way it assigns the address).
I would like to look into IPv6 Talos support a bit more to cover all cases, but that might come only next year for Talos 1.10 release.
Hmm Interesting, thank you for all the effort with this project, i'll eagerly await talos 1.10. If I can provide any more details/tests feel free to contact me anytime.
Bug Report
Description
A Raspi lets IPv6 connectivy laps, after the valid_lft time expires:
After a rebooting the node, a new IPv6 connection is established, talos shows both ipv4 and ipv6 ips under
talosctl get nodeip
andip addr
using a pod on the hostnetwork outputs the following: ` 4: enxe45f01a83c47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e4:5f:01:a8:3c:47 brd ff:ff:ff:ff:ff:ff`
After the valid_lft time expires, kubelet is restartet and
ip addr
displays the following:` 4: enxe45f01a83c47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether e4:5f:01:a8:3c:47 brd ff:ff:ff:ff:ff:ff
` Afterwards, the node is not reachable via ipv6, and pings work neither from nor to the node.
Other devices on the same network do not have this issue with IPv6 connectivity and I am a bit lost how to further debug this issue. I found some posts suggesting issues with crashing DHCP services on Raspis if a large ammount of interfaces were present, but I would expect that to come up in the logs somwhere.
Logs
hestia.log
Environment