moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.15k stars 1.15k forks source link

Leaked processes in rootless mode #2417

Open lbernail opened 3 years ago

lbernail commented 3 years ago

When running rootless with --oci-worker-no-process-sandbox we have noticed that builds can end leaking processes and create issues for following builds.

Here is an example Dockerfile:

FROM ubuntu
ADD script.sh /
RUN /script.sh
RUN echo "Dockerfile done"

and script.sh

#!/bin/bash
set -x
apt-get update
apt-get install -y netcat
nc -l 5432 &
echo "Script done"

When running rootless with --oci-worker-no-process-sandbox here is what happens:

In addition, killing the leaked process leaves a zombie

ubuntu      4282  0.0  0.0 709668  6504 ?        Sl   16:35   0:00      |   \_ /proc/self/exe buildkitd --config /etc/buildkit/buildk
itd.toml
ubuntu      4297  0.8  0.1 734252 35676 ?        Sl   16:35   0:04      |   |   \_ buildkitd --config /etc/buildkit/buildkitd.toml
ubuntu     12340  0.0  0.0      0     0 ?        Z    16:39   0:00      |   \_ [nc] <defunct>

If we run rootless but privileged and without --oci-worker-no-process-sandbox everything works as expected: image builds and we don't leak processes (as expected because they run in a different pid namespace)

Of course, this example is just a reproduction but we have seen the problem with some Dockerfiles where installing a package will start a daemon.

I'm wondering if we could track processes started in the background.

In addition, in our setup we use buildkitd to run concurrent builds and sharing the pid and network namespace is likely to create problems from time to time. Have you considered a buildx kubernetes driver that would start a buildkitd pod (directly or with a job) and delete it when the build is over?

tonistiigi commented 3 years ago

@AkihiroSuda Can we use a cgroup to make sure this doesn't happen?

AkihiroSuda commented 3 years ago

No, we can't write cgroup

AkihiroSuda commented 3 years ago

I guess supporting runsc would be more realistic solution. (Ideally we should be able to run buildkitd inside runsc, but in this context I'm just talking about using runsc as the binary for the OCI worker of buildkitd running inside a plain old runc container)

lbernail commented 3 years ago

@AkihiroSuda could cgroupv2 help here? It seems some controllers are available to unprivileged users and some can be delegated to unprivileged users