squat / generic-device-plugin

A Kubernetes device plugin to schedule generic Linux devices
Apache License 2.0
210 stars 23 forks source link

USB discovery error #27

Closed vigmat28 closed 1 year ago

vigmat28 commented 1 year ago

Hi, I'm running the device-plugin and I'm trying to discover /dev/fuse, but the device is not found and in the kubelet log I see these errors:

May 10 11:46:45 xxxx kubelet[14485]: I0510 11:46:45.807900   14485 reconciler.go:342] "operationExecutor.VerifyControllerAttachedVolume started for volume \"device-plugin\" (UniqueName: \"kubernetes.io/host-path/64159588-4463-452f-941e-7f11d39411f4-device-plugin\") pod \"generic-device-plugin-hfxdl\" (UID: \"64159588-4463-452f-941e-7f11d39411f4\") " pod="default/generic-device-plugin-hfxdl"
May 10 11:46:45 xxxx kubelet[14485]: I0510 11:46:45.808061   14485 reconciler.go:342] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-5lhdq\" (UniqueName: \"kubernetes.io/projected/64159588-4463-452f-941e-7f11d39411f4-kube-api-access-5lhdq\") pod \"generic-device-plugin-hfxdl\" (UID: \"64159588-4463-452f-941e-7f11d39411f4\") " pod="default/generic-device-plugin-hfxdl"
May 10 11:46:45 xxxx kubelet[14485]: I0510 11:46:45.808211   14485 reconciler.go:342] "operationExecutor.VerifyControllerAttachedVolume started for volume \"dev\" (UniqueName: \"kubernetes.io/host-path/64159588-4463-452f-941e-7f11d39411f4-dev\") pod \"generic-device-plugin-hfxdl\" (UID: \"64159588-4463-452f-941e-7f11d39411f4\") " pod="default/generic-device-plugin-hfxdl"
May 10 11:46:49 xxxx kubelet[14485]: I0510 11:46:49.019886   14485 manager.go:422] "Got registration request from device plugin with resource" resourceName="squat.ai/fuse"
May 10 11:46:49 xxxx kubelet[14485]: E0510 11:46:49.118835   14485 endpoint.go:107] "listAndWatch ended unexpectedly for device plugin" err="rpc error: code = Unknown desc = failed to refresh devices: failed to discover usb devices: open /sys/bus/usb/devices/: no such file or directory" resourceName="squat.ai/fuse"

On the worker node the directory /sys/bus/usb/devices/ doesn't exist:

xxxx@xxxx:/$ ls -la /sys/bus/
total 0
drwxr-xr-x 30 root root 0 May 10 09:55 .
dr-xr-xr-x 13 root root 0 May 10 09:55 ..
drwxr-xr-x  4 root root 0 May 10 09:55 acpi
drwxr-xr-x  4 root root 0 May 10 09:55 cec
drwxr-xr-x  4 root root 0 May 10 09:55 clockevents
drwxr-xr-x  4 root root 0 May 10 09:55 clocksource
drwxr-xr-x  4 root root 0 May 10 09:55 container
drwxr-xr-x  4 root root 0 May 10 09:55 cpu
drwxr-xr-x  4 root root 0 May 10 09:55 dax
drwxr-xr-x  4 root root 0 May 10 09:55 edac
drwxr-xr-x  4 root root 0 May 10 09:55 event_source
drwxr-xr-x  4 root root 0 May 10 09:55 gpio
drwxr-xr-x  4 root root 0 May 10 09:55 i2c
drwxr-xr-x  4 root root 0 May 10 09:55 machinecheck
drwxr-xr-x  4 root root 0 May 10 09:55 memory
drwxr-xr-x  4 root root 0 May 10 09:55 mipi-dsi
drwxr-xr-x  4 root root 0 May 10 09:55 node
drwxr-xr-x  4 root root 0 May 10 09:55 nvmem
drwxr-xr-x  5 root root 0 May 10 09:55 pci
drwxr-xr-x  4 root root 0 May 10 09:55 pci_express
drwxr-xr-x  4 root root 0 May 10 09:55 platform
drwxr-xr-x  4 root root 0 May 10 09:55 pnp
drwxr-xr-x  4 root root 0 May 10 09:55 rbd
drwxr-xr-x  4 root root 0 May 10 09:55 scsi
drwxr-xr-x  4 root root 0 May 10 09:55 serial
drwxr-xr-x  4 root root 0 May 10 09:55 serio
drwxr-xr-x  4 root root 0 May 10 09:55 spi
drwxr-xr-x  4 root root 0 May 10 09:55 workqueue
drwxr-xr-x  4 root root 0 May 10 09:55 xen
drwxr-xr-x  4 root root 0 May 10 09:55 xen-backend

This is the yaml file that I applied:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: default
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      containers:
      - image: squat/generic-device-plugin
        args:
        - --device
        - |
          name: fuse
          groups:
            - count: 10
              paths:
                - path: /dev/fuse
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 10Mi
          limits:
            cpu: 50m
            memory: 10Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

On the source code I found this:

https://github.com/squat/generic-device-plugin/blob/013b39297e99d0bccb6cbb9e60c828e710a16e40/deviceplugin/usb.go#L134

How about checking if the folder exists instead of just throwing the error if it doesn't?

squat commented 1 year ago

Yes, I think that would be fine for graceful degradation, especially on systems that don't have any USB devices. Would you like to make a PR to log and ignore the error and just move on?

Also, out of curiosity, what OS/kernel/distro are you running? I'm curious as to why that directory does not exist on your system.

vigmat28 commented 1 year ago

Also, out of curiosity, what OS/kernel/distro are you running? I'm curious as to why that directory does not exist on your system.

Linux xxxx 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux