lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, TPU, IPU and other NPUs.
https://www.backend.ai
GNU Lesser General Public License v3.0
519 stars 153 forks source link

feat: Implement Fine-grained seccomp profile #3019

Open jopemachine opened 2 weeks ago

jopemachine commented 2 weeks ago

Resolves #2931.

Now we can implement get_additional_syscalls() in the accelerator implementation to allow additional system calls in the containers.

[!TIP] I think it would be a good idea to automate the process of checking if the default-seccomp.json profile is up to date and updating it through CI.

Although it is not directly related to this PR, I think referring to the PR below will be helpful for automating the update of the default-seccomp.json file. https://github.com/lablup/backend.ai-jail/pull/18

Reference

How to Test

We can test this PR using the following method.

  1. Let's try blocking some essential system calls for session creation in the default-seccomp.json file. Then, session creation will fail as shown below.
❯ ./backend.ai session create python
✗ Session ID 1c8844a4-0e41-4b3b-9098-1dab7cdc97e9 has an error during scheduling/startup or cancelled.
  1. Next, let’s implement the get_additional_syscalls() method in the CUDA MockPlugin class to return the blocked system calls.

  2. Lastly, let’s try creating the session again with the mock plugin resource options. This time, we could see that the session creation is successful.

❯ ./backend.ai session create -r cuda.shares=1 python
∙ Session ID 81dc4784-2271-48a6-a512-a3342032f53b is created and ready.
∙ This session provides the following app services: sshd, ttyd, jupyter, jupyterlab

Checklist: (if applicable)

jopemachine commented 2 weeks ago

This stack of pull requests is managed by Graphite. Learn more about stacking.