intel / afxdp-plugins-for-kubernetes

Apache License 2.0
44 stars 16 forks source link

eno1 a globally prohibited device, removing from list of host devices #25

Open Alt-Shivam opened 1 year ago

Alt-Shivam commented 1 year ago

Hey, I'm facing this issue after af-xdp plugin deployed successfully. Logs:

INFO[2023-01-27 09:49:48] Reading config file: /afxdp/config/config.json 
INFO[2023-01-27 09:49:48] Unmarshalling config data                    
INFO[2023-01-27 09:49:48] Config Data:
{
  "Pools": [
    {
      "Name": "eastPool",
      "Mode": "primary",
      "Drivers": [
        {
          "Name": "i40e",
          "Primary": 0,
          "Secondary": 0,
          "ExcludeDevices": [
            {
              "Name": "eno2",
              "Pci": "",
              "Mac": "",
              "Secondary": 0
            }
          ],
          "ExcludeAddressed": false
        }
      ],
      "Devices": null,
      "Nodes": null,
      "UdsServerDisable": false,
      "UdsTimeout": 0,
      "UdsFuzz": false,
      "RequiresUnprivilegedBpf": false,
      "uid": 0,
      "ethtoolCmds": null
    },
    {
      "Name": "westPool",
      "Mode": "primary",
      "Drivers": [
        {
          "Name": "i40e",
          "Primary": 0,
          "Secondary": 0,
          "ExcludeDevices": [
            {
              "Name": "eno1",
              "Pci": "",
              "Mac": "",
              "Secondary": 0
            }
          ],
          "ExcludeAddressed": false
        }
      ],
      "Devices": null,
      "Nodes": null,
      "UdsServerDisable": false,
      "UdsTimeout": 0,
      "UdsFuzz": false,
      "RequiresUnprivilegedBpf": false,
      "uid": 0,
      "ethtoolCmds": null
    }
  ],
  "LogFile": "afxdp-dp.log",
  "LogLevel": "debug"
} 
INFO[2023-01-27 09:49:48] Validating config data                       
INFO[2023-01-27 09:49:48] Setting log directory: /var/log/afxdp-k8s-plugins/ 
INFO[2023-01-27 09:49:48] Setting log file: afxdp-dp.log               
INFO[2023-01-27 09:49:48] Setting log level: debug                     
INFO[2023-01-27 09:49:48] Switching to debug log format                
INFO[2023-01-27 09:49:48] [main.go:75] [main] Starting AF_XDP Device Plugin                
INFO[2023-01-27 09:49:48] [main.go:78] [main] Checking if host meets requriements          
DEBU[2023-01-27 09:49:48] [main.go:171] [checkHost] Checking kernel version                      
DEBU[2023-01-27 09:49:48] [main.go:197] [checkHost] Kernel version: 5.13.0-1009-oem meets minimum requirements 
DEBU[2023-01-27 09:49:48] [main.go:200] [checkHost] Checking host for Libbpf                     
DEBU[2023-01-27 09:49:48] [host.go:85] [HasLibbpf] Directory /usr/lib64/ does not exist         
DEBU[2023-01-27 09:49:48] [main.go:207] [checkHost] Libbpf found on host:                        
DEBU[2023-01-27 09:49:48] [main.go:209] [checkHost]     /usr/lib/libbpf.so.0                        
DEBU[2023-01-27 09:49:48] [main.go:209] [checkHost]     /usr/lib/libbpf.so.0.5.0                    
INFO[2023-01-27 09:49:48] [main.go:88] [main] Host meets requriements                      
INFO[2023-01-27 09:49:48] [main.go:91] [main] Getting device pools                         
DEBU[2023-01-27 09:49:48] [config.go:111] [GetPoolConfigs] Unprivileged BPF is allowed on this host     
DEBU[2023-01-27 09:49:48] [config.go:135] [GetPoolConfigs] eno2 a globally prohibited device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] docker0 is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] cali57cbcb24c31 is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] cali231ee6496e0 is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:135] [GetPoolConfigs] eno1 a globally prohibited device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] caliadc8f19fd1c is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] caliae6307e8d8d is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] cali903687db04a is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] lo is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] iface is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:130] [GetPoolConfigs] vxlan.calico is not a physical device, removing from list of host devices 
DEBU[2023-01-27 09:49:48] [config.go:145] [GetPoolConfigs] Host devices:
{
  "ens1f0": {},
  "ens1f1": {}
} 
INFO[2023-01-27 09:49:48] [config.go:149] [GetPoolConfigs] Processing Pool: eastPool                    
DEBU[2023-01-27 09:49:48] [config.go:163] [GetPoolConfigs] Using default UDS timeout: 30 seconds        
DEBU[2023-01-27 09:49:48] [config.go:254] [getDeviceListOfDriverType] ens1f0 is the wrong driver type: igb         
DEBU[2023-01-27 09:49:48] [config.go:254] [getDeviceListOfDriverType] ens1f1 is the wrong driver type: igb         
DEBU[2023-01-27 09:49:48] [config.go:273] [getDeviceListOfDriverType] Exit discovery.                              
INFO[2023-01-27 09:49:48] [config.go:149] [GetPoolConfigs] Processing Pool: westPool                    
DEBU[2023-01-27 09:49:48] [config.go:163] [GetPoolConfigs] Using default UDS timeout: 30 seconds        
DEBU[2023-01-27 09:49:48] [config.go:254] [getDeviceListOfDriverType] ens1f0 is the wrong driver type: igb         
DEBU[2023-01-27 09:49:48] [config.go:254] [getDeviceListOfDriverType] ens1f1 is the wrong driver type: igb         
DEBU[2023-01-27 09:49:48] [config.go:273] [getDeviceListOfDriverType] Exit discovery.
garyloug commented 1 year ago

Hi @Alt-Shivam

Thanks for your feedback here. Let me explain these prohibited devices and the reasoning for them.

eno1, eno2, eth0, eth1, etc. These tend to be the default management ports that are built into the main board of a server/host. They are the ports we use to connect to the host itself, over something like ssh or vnc.

Our device plugin is usually configured to find devices of a certain driver type. The problem is these server ports sometimes share the same driver as AF_XDP enabled network cards such as X710 or E810. This means the plugin will pick up the management port and add them to the device pool along with the other AF_XDP NIC ports.

That becomes a problem when you spin up a pod and Kubernets allocates your server management port to the pod. The plugins move the port out of the host network namespace and into the pod network namespace. At this point your host is now unreachable. You'd need to physically go to the machine, attach a KVM and manually delete the pod to get your host network back. That or force a reboot.

Assigning eno1/eno2/eth0/eth1 has the potential to "brick" a remote machine, so we prohibited the plugins for picking up these devices.

I'd be interested to know your usecase? If there's a good reason for using eno1, then maybe we disable prohibited devices check through a config option or something.

As a short term quick solution, if you still wish to proceed with using enoX then the quickest way to get up and running is to remove them from the prohibited devices list here at line 37: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/constants/constants.go#L37

KR, Gary

Alt-Shivam commented 1 year ago

Thank you, @garyloug for providing such a detailed and informative explanation. My motive is to try out CNDP with its potential use cases and benchmark it.

One more thing: Can you suggest me a NIC to try out cdq mode?

I currently have these ones:

but I'm unable to bind them with ice driver. Driver Link

Thanks & Regards Shivank