siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.83k stars 546 forks source link

Talos `v1.6.0` Creates A Bunch Of Extra Interfaces #8079

Closed dcplaya closed 1 month ago

dcplaya commented 10 months ago

Bug Report

Description

Tried upgrading some worker nodes to v1.6.0 and I got a bunch of "warnings" about DHCP requests. unable to receive an offer: got an error while the discovery request: error writing packet to connection: write packet $MAC_ADDRESS: sendto: no buffer space available. This error occurred on a bunch of links of devices that don't really exist but I do have VLANs configured in my talos config corresponding to the errors showing but only the VLAN part of the link. For example, my node shows a link of eth103747e.100 and eth103747e.2, which I have no device corresponding to eth103747e but I do have an eth1.2 and eth1.100. Downgrading to talos v1.5.5 and the problem goes away.

Logs

Here is a screenshot of the worker node (its a VM in Proxmox) as well as a partial output of talosctl get addresses --node work1 and a support bundle of the problem node. image image support.zip

Environment

dcplaya commented 10 months ago

To add a bit more info and thanks to @bjw-s for finding the temporary fix, by adding driver: virtio_net to my network selectors, this bug does not happen.

Also, my nodes that do not have VLANs configured does not seem to have this bug when upgrading.

smira commented 10 months ago

I wonder if that happens because of the https://www.talos.dev/v1.6/introduction/what-is-new/#network-device-selectors?

what was your device selector before/after?

dcplaya commented 10 months ago

This is talhelper format but it should convey the difference. Just posted my network stuff but if you want to see the entire talhelper config, its located here

Before

      networkInterfaces:
        - deviceSelector:
            hardwareAddr: "72:f2:0d:00:ac:b4"
          dhcp: true
          mtu: 9000
        - deviceSelector:
            hardwareAddr: "da:29:dc:d9:75:f7"
          dhcp: false
          mtu: 9000
          vlans:
            - vlanId: 2
              dhcp: true
              mtu: 1500
              dhcpOptions:
                routeMetric: 2048
            - vlanId: 100
              dhcp: true
              mtu: 1500
              dhcpOptions:
                routeMetric: 4096

After

      networkInterfaces:
        - deviceSelector:
            driver: virtio_net
            hardwareAddr: "72:f2:0d:00:ac:b4"
          dhcp: true
          mtu: 9000
        - deviceSelector:
            driver: virtio_net
            hardwareAddr: "da:29:dc:d9:75:f7"
          dhcp: false
          mtu: 9000
          vlans:
            - vlanId: 2
              dhcp: true
              mtu: 1500
              dhcpOptions:
                routeMetric: 2048
            - vlanId: 100
              dhcp: true
              mtu: 1500
              dhcpOptions:
                routeMetric: 4096
smira commented 10 months ago

Oh yeah, VLANs share MAC address with the parent, so it will match VLANs as well I believe

dcplaya commented 10 months ago

A hard limit to the number interfaces maybe?

Right now, if I didn't add in the driver key, the upgrade would go through and either eventually crash the new Talos image, causing it to reboot (luckily, it would reboot back to a working Talos version) or it would stay on 1.6.0 but basically be unresponsive and act really weird.

smira commented 10 months ago

yes, you should have a selector which doesn't match the VLAN. the one you had previously only worked by chance

rothgar commented 1 month ago

I'm assuming this issue was fixed so I'm going to close this. If not please let us know.