moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.2k stars 1.16k forks source link

Unable to use unshare during build process even though whitelisted in seccomp profile #789

Open tbeadle opened 5 years ago

tbeadle commented 5 years ago

I am running docker with a modified seccomp profile that whitelists unshare, mount, umount, and umount2. If I try to build an image with buildkit with a Dockerfile that uses unshare in a RUN line, the image fails to build. If DOCKER_BUILDKIT is unset, then it builds successfully.

laptop [~/foo]$ cat Dockerfile
FROM debian:jessie
RUN unshare --user --map-root-user whoami
laptop [~/foo]$ unset DOCKER_BUILDKIT
laptop [~/foo]$ docker build -t foo .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM debian:jessie
 ---> bb64860610f6
Step 2/2 : RUN unshare --user --map-root-user whoami
 ---> Running in 0d4c19785d68
root
Removing intermediate container 0d4c19785d68
 ---> 8f68c667c8d8
Successfully built 8f68c667c8d8
Successfully tagged foo:latest
laptop [~/foo]$ export DOCKER_BUILDKIT=1
laptop [~/foo]$ docker build -t foo .
[+] Building 0.7s (5/5) FINISHED                                                                                                                                                                                                   
 => [internal] load build definition from Dockerfile                                                                                                                                                                          0.1s
 => => transferring dockerfile: 37B                                                                                                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                                                                                                             0.2s
 => => transferring context: 2B                                                                                                                                                                                               0.0s
 => [internal] load metadata for docker.io/library/debian:jessie                                                                                                                                                              0.0s
 => CACHED [1/2] FROM docker.io/library/debian:jessie                                                                                                                                                                         0.0s
 => ERROR [2/2] RUN unshare --user --map-root-user whoami                                                                                                                                                                     0.5s
------
 > [2/2] RUN unshare --user --map-root-user whoami:
#5 0.352 unshare: unshare failed: Operation not permitted
------
executor failed running [/bin/sh -c unshare --user --map-root-user whoami]: exit code: 1
laptop [~/foo]$ uname -r
4.20.1-arch1-1-ARCH
laptop [~/foo]$ docker version
Client:
 Version:           18.09.1-ce
 API version:       1.39
 Go version:        go1.11.4
 Git commit:        4c52b901c6
 Built:             Thu Jan 10 06:51:04 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.1-ce
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.11.4
  Git commit:       4c52b901c6
  Built:            Thu Jan 10 06:50:46 2019
  OS/Arch:          linux/amd64
  Experimental:     false
laptop [~/foo]$ docker info
Containers: 6
 Running: 1
 Paused: 0
 Stopped: 5
Images: 264
Server Version: 18.09.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9f2e07b1fc1342d1c48fe4d7bbb94cb6d1bf278b.m
runc version: 079817cc26ec5292ac375bb9f47f373d33574949
init version: fec3683
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
Kernel Version: 4.20.1-arch1-1-ARCH
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.42GiB
Name: laptop
ID: EWRT:BUVO:GBFG:4O5M:IQN7:332P:ZDRZ:AWSC:OZUN:2YQ2:U2D4:Q7FT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: tbeadle
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
[root@laptop docker]# sysctl -a | grep userns
kernel.unprivileged_userns_clone = 1
[root@laptop docker]# sysctl -a | grep namespace
user.max_cgroup_namespaces = 62874
user.max_ipc_namespaces = 62874
user.max_mnt_namespaces = 62874
user.max_net_namespaces = 62874
user.max_pid_namespaces = 62874
user.max_user_namespaces = 62874
user.max_uts_namespaces = 62874
[root@laptop docker]# ps -efww | grep docker
root     16077     1  0 17:41 ?        00:01:53 /usr/bin/dockerd -H fd:// --seccomp-profile=/etc/docker/seccomp.json
root     16085 16077  0 17:41 ?        00:00:08 containerd --config /var/run/docker/containerd/containerd.toml --log-level info

The seccomp profile that I using is the same as the default one (from https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json) except with the following changes to whitelist unshare, mount, umount, and umount2:

laptop [~/foo]$ diff ~/default.json /etc/docker/seccomp.json 
196a197
>                               "mount",
348a350,351
>                               "umount",
>                               "umount2",
351a355
>                               "unshare",
556d559
<                               "mount",
563,566c566
<                               "syslog",
<                               "umount",
<                               "umount2",
<                               "unshare"
---
>                               "syslog"
tonistiigi commented 5 years ago

I was not aware that this can be set in config. Buildkit does not use docker API for execution but containerd/runc directly (and their security profile). I don't think allowing to set this kind of thing in builder is a good idea as it makes Dockerfiles unportable, we are in process of adding #570 with a more portable solution. Even the daemon level options were not probably added with this use case in mind.

I would think we would never add this to buildkit itself but we should probably still add it in moby integration to keep the compatibility.