intel / intel-device-plugins-for-kubernetes

Collection of Intel device plugins for Kubernetes
Apache License 2.0
39 stars 204 forks source link

DSA and IAA demos fail with "mmap: Operation not permitted" #1779

Closed tkatila closed 2 months ago

tkatila commented 3 months ago

Describe the bug As Ubuntu hwe kernel updated from 6.5.0-35 to 6.5.0-41, DSA e2e started to fail. The failure seems to be related to a change in the kernel where (due to a security issue) access to the DSA/IAA accelerators is being denied without SYS_RAWIO capability: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=796aec4a5b5850967af0c42d4e84df2d748d570b

To Reproduce

Expected behavior Pod Completes successfully.

System (please complete the following information):

Additional context Adding:

   securityContext:
      capabilities:
        add: ["SYS_RAWIO"]

might fix it temporarily.

eero-t commented 3 months ago

man capabilities lists that capability allowing following:

CAP_SYS_RAWIO
              •  Perform I/O port operations (iopl(2) and ioperm(2));
              •  access /proc/kcore;
              •  employ the FIBMAP ioctl(2) operation;
              •  open  devices  for  accessing  x86  model-specific  registers
                 (MSRs, see msr(4));
              •  update /proc/sys/vm/mmap_min_addr;
              •  create memory mappings at addresses below the value specified
                 by /proc/sys/vm/mmap_min_addr;
              •  map files in /proc/bus/pci;
              •  open /dev/mem and /dev/kmem;
              •  perform various SCSI device commands;
              •  perform certain operations on hpsa(4) and cciss(4) devices;
              •  perform a range of device-specific operations  on  other  de‐
                 vices.

Which of the listed procfs + sysfs files & dirs are mounted, and need to be mounted, inside DSA/IAA example containers?

HackToday commented 3 months ago

I added the capability in my pod yaml, it can run with no issue. So the capability is needed, I think.

https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/dsa_plugin/README.md#testing-and-demos

tkatila commented 3 months ago
CAP_SYS_RAWIO
              •  perform a range of device-specific operations  on  other  devices.

Which of the listed procfs + sysfs files & dirs are mounted, and need to be mounted, inside DSA/IAA example containers?

I do not believe we need any extra host files or directories. The kernel patch is quite self-explaining: for certain devices check for sys_rawio capability. If no capability, return eperm.

Another question is how should the dsa plugin handle this? If at all.

eero-t commented 3 months ago

I do not believe we need any extra host files or directories.

I meant is there something that should be masked out of the container...

tkatila commented 3 months ago

I meant is there something that should be masked out of the container...

Probably "everything" that the sys_rawio brings. DSA demo has worked without it before and the kernel side only needs the sys_rawio capability flag.

mythi commented 3 months ago

I do not believe we need any extra host files or directories.

I meant is there something that should be masked out of the container...

runtimes mask out some proc paths by default but it'd be good to cross-check these. In general, adding add: ["SYS_RAWIO"] should be OK, IMO. We add something similar for QAT (IPC_LOCK).