Open adawalli opened 3 years ago
Thank you so much for reporting the issue. We don't actively support userns mode with Docker right now, I'm afraid. It will be nice to add such support in future, and we'll need to do some research there (e.g. how it affects volume/networking integrations that Nomad 0.11/0.12 just added).
In the short term, I'm curious if making dockremap
the owner (or grant write access) to /data/nomad
would help?
@notnoop - do you have a secure deployment guide you recommend for docker then? userns is typically used in best-practices to help mitigate container escalation
@adawalli Works for me!
Running docker daemon under user namespaces with a remapped root smahajan
root@smahajan-VirtualBox:/tmp# cat /etc/subuid
smahajan:100000:65536
root@smahajan-VirtualBox:/tmp# cat /etc/subgid
smahajan:100000:65536
root@smahajan-VirtualBox:/tmp# systemctl cat docker
# /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --userns-remap=smahajan
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --userns-remap=smahajan
$ nomad job init -short
Example job file written to example.nomad
$ nomad job run example.nomad
$ nomad job status
ID Type Priority Status Submit Date
example service 50 running 2020-07-20T14:49:49-07:00
root@smahajan-VirtualBox:/tmp# docker top $(docker ps -lq)
UID PID PPID C STIME TTY TIME CMD
100999 9502 9476 0 14:49 ? 00:00:01 redis-server *:6379
root@smahajan-VirtualBox:/tmp# nomad alloc exec -i -t 3bcaee6f /bin/bash
root@ead293e83d51:/data# id -u
0
root@ead293e83d51:/data# id -g
0
Non-root on the host, root inside container.
I am wondering why are you setting userns-remap: default
. Should this be userns-remap: dockremap
?
Oh I see
When you configure Docker to use the userns-remap feature, you can optionally specify an existing user and/or group, or you can specify default. If you specify default, a user and group dockremap is created and used for this purpose.
default
just creates dockremap
for you! Can you try creating the mapping manually with another existing user and try it out? and see if that resolves the issue.
@adawalli Works for me with userns-remap=default
too!
Let me give this another try in the next few days - will report back!
@adawalli Works for me!
Running docker daemon under user namespaces with a remapped root
smahajan
root@smahajan-VirtualBox:/tmp# cat /etc/subuid smahajan:100000:65536 root@smahajan-VirtualBox:/tmp# cat /etc/subgid smahajan:100000:65536
root@smahajan-VirtualBox:/tmp# systemctl cat docker # /lib/systemd/system/docker.service [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com BindsTo=containerd.service After=network-online.target firewalld.service containerd.service Wants=network-online.target Requires=docker.socket [Service] Type=notify # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --userns-remap=smahajan ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 RestartSec=2 Restart=always
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --userns-remap=smahajan
$ nomad job init -short Example job file written to example.nomad $ nomad job run example.nomad $ nomad job status ID Type Priority Status Submit Date example service 50 running 2020-07-20T14:49:49-07:00
root@smahajan-VirtualBox:/tmp# docker top $(docker ps -lq) UID PID PPID C STIME TTY TIME CMD 100999 9502 9476 0 14:49 ? 00:00:01 redis-server *:6379 root@smahajan-VirtualBox:/tmp# nomad alloc exec -i -t 3bcaee6f /bin/bash root@ead293e83d51:/data# id -u 0 root@ead293e83d51:/data# id -g 0
Non-root on the host, root inside container. I am wondering why are you setting
userns-remap: default
. Should this beuserns-remap: dockremap
?
What user are you running nomad under - root
? or under smahajan
?
What user are you running nomad under -
root
? or undersmahajan
?
@adawalli root
It honestly doesn't make sense to me that the root partition could be viewed by a namespaced process - isn't that a breakdown of namespacing?
that's why
: mounting \\\"/data/nomad/alloc/f79d1ebb-31d7-673c-a80f-dc5e3cab1b0d/alloc\\\" to rootfs \\\"/data/docker-storage/100000.100000/bdfac6d255117f5f26fd/merged\\\" at \\\"/alloc\\\" caused \\\"stat /data/nomad/alloc/f79d1ebb-31d7-673c-a80f-dc5e3cab1b0d/alloc: permission denied\\\"\"":
actually seemed like a pretty reasonable error
And sorry, not able to reproduce your behavior with dockremap - I will try it with a manually created user just to close up that loose end as well.
The following also did not work
dockremap
settingsThis is kind of a big bummer, because there are plenty of upstream docker images (e.g., traefik) that don't add a non-root user to their default image. I am surprised that more folks aren't impacted by this limitation. Is everyone really running docker as root and not enforcing userns??
@adawalli What is your docker root location?
docker info | grep "Docker Root"
and is the root location owned by root
or non-root
?
@shishir-a412ed - I really appreciate you helping me and continuing to ask question. My hope is not lost in the internet!
$ docker info | grep Root
Docker Root Dir: /data/docker-storage/100000.100000
$ ls -lha /data
total 36K
drwxr-xr-x 6 root root 4.0K Jul 21 16:54 .
drwxr-xr-x 24 root root 4.0K Jun 2 06:02 ..
drwxrwxr-x 5 root blackduck 4.0K Jun 29 19:15 blackduck
drwx--x--x 3 root root 4.0K Jul 21 16:54 docker-storage
drwx------ 2 root root 16K May 11 08:32 lost+found
drwx------ 4 root bin 4.0K Jul 21 14:18 nomad
$ sudo ls -la /data/docker-storage/100000.100000
total 56
drwx------ 14 100000 100000 4096 Jul 21 16:54 .
drwx--x--x 3 root root 4096 Jul 21 16:54 ..
drwx------ 2 root root 4096 Jul 21 16:54 builder
drwx--x--x 4 root root 4096 Jul 21 16:54 buildkit
drwx------ 2 100000 100000 4096 Jul 21 16:56 containers
drwx------ 3 root root 4096 Jul 21 16:54 image
drwxr-x--- 3 root root 4096 Jul 21 16:54 network
drwx------ 9 100000 100000 4096 Jul 21 16:56 overlay2
drwx------ 4 root root 4096 Jul 21 16:54 plugins
drwx------ 2 root root 4096 Jul 21 16:54 runtimes
drwx------ 2 root root 4096 Jul 21 16:54 swarm
drwx------ 2 100000 100000 4096 Jul 21 16:54 tmp
drwx------ 2 root root 4096 Jul 21 16:54 trust
drwx------ 5 100000 100000 4096 Jul 21 16:56 volumes
@adawalli No worries! I think the problem is your docker root location (/data
) is root
owned, and when you launch the container, the container root filesystem (rootfs) needs to be mounted in the container. In your case, this won't fly well, since container rootfs on the host is owned by root
, and it's trying to mount it inside the container which is remapped root (not real root).
This is okay with default docker root location /var/lib/docker/100000.100000
since that is not root
owned.
Can you try to chown
your docker root location and try to launch a nomad job and see if you still get the permission error?
chown 100000:100000 -R /data
Another option is if somewhere in your configuration you are setting your docker root manually, clear it so it fallback to default root location which is /var/lib/docker/<remapped_root>
that was a worthy thing to try, but unfortunately, same results - it still doesn't seem to like the root-owned partition from /alloc in nomad mounting into that namespace
I even rolled back the storage location as you recommended with exactly the same results
FWIW, I am using Docker version 19.03.11, build 42e35e61f3
I thought we were chown
ing the entire /data
(based on my comment above)?
Why is /alloc
in nomad
still root-owned?
ok, so I wasn't comfortable running chown
recursively on nomad's data folder. Why not? Because, the nomad process, running as root, makes folders as it's managing nomad, and those will be owned by root
as it creates them.
However, chown
ing just the root of the nomad data folder appears to be enough!
sudo ls -lha nomad
total 20K
drwxr-x--- 4 100000 100000 4.0K Jul 22 06:03 .
drwxr-xr-x 6 root root 4.0K Jul 22 06:04 ..
drwx--x--x 3 root bin 4.0K Jul 22 06:05 alloc
-rw-r--r-- 1 root bin 394 Jul 22 06:03 checkpoint-signature
drwx------ 2 root bin 4.0K Jul 22 06:02 client
I am now able to run containers with namespacing - I really hope that the nomad team puts a priority on adding this support in properly, but glad I can forge ahead for the moment.
@adawalli
ok, so I wasn't comfortable running chown recursively on nomad's data folder.
Yeah, I understand! I was trying to validate (and understand) the issue. My understanding was once nomad hands it over to the docker driver to launch the task, it should have nothing to do with docker, and docker should manage the namespace itself.
But looks like nomad is mounting to the docker root location at the container start, and it needs to be non-root too. Yeah, maybe nomad will support this use-case better in the future. I don't work for hashicorp so I cannot help you there :) Glad you have a workaround for now! to get around the situation.
If filing a bug please include the following:
Nomad version
Nomad v0.12.0 (8f7fbc8e7b5a4ed0d0209968faf41b238e6d5817)
Operating system and Environment details
Issue
By default, we run our docker daemon with
userns-remap=default
. In this case, even the simplest Job file (e.g., fromnomad init -short
) is failingDocker Daemon File
Reproduction steps
nomad init -short
Job file (if appropriate)
Nomad Client logs (if appropriate)
Nomad Log Snippet
It looks like root own
/data/nomad/alloc/f79d1ebb-31d7-673c-a80f-dc5e3cab1b0d/alloc
but/data/docker-storage/100000.100000/bdfac6d255117f5f26fd/merged\\\
is of course user namespacedIn order to make things simple - I am running the client Nomad node as root (although would prefer to add it to docker group later on).
Any ideas?