Closed jinnko closed 2 years ago
You see, that's exactly the reason why I dislike kitchen sink tools like Docker that have zillions of ways to achieve the same thing in slightly different ways.
Normally, in a multiprocess container, the container is run with its entrypoint running as root, then the init system performs privilege separation and runs every subservice under a different uid. That's how s6 operates traditionally; that's what s6-overlay tries to recreate; s6 is secure enough that the supervision tree can run as root, provided that services drop their root privileges in the run script. It works and nobody has ever complained about it.
Then people wanted support for USER containers. In this mode, the entrypoint runs as an unprivileged user, and the whole process tree in the container runs at that user. Sure, more processes are unprivileged, but there is no uid separation, so it's arguable what is more secure - and it really depends on the number and nature of the subservices. Well, it's still a reasonable request, so we supported USER containers. Unfortunately, since we cannot be sure that the USER will be the same from one invocation to the next, we still need root privileges for a couple operations (in preinit
), and relinquish them forever afterwards. It was a lot of work, but it's now operational.
And now, you are reporting another way for containers to drop privileges - this time, uids are remapped on the fly, and what is supposed to be root isn't root anymore, by magic! Well, no surprise that it doesn't work with the mechanism we have for USER containers.
I suppose I can turn the error in s6-overlay-suexec
into a warning, but we cannot guarantee that everything will work further on in the container init sequence. It should, but userns-remap is really not how Unix was supposed to work and it breaks a number of assumptions. Best effort is the best we can do.
Also, any service attempting to do privilege separation will fail with userns-remap as it fails with USER: typically, syslogd-overlay
won't support userns-remap.
@skarnet I think you've misunderstood either the purpose or mechanisms of userns-remap.
From the running container perspective, everything will be exactly as if it's a normal system with the init entrypoint running as root. From the above linked docs:
... without the running process being aware of the limitations
It's not a mechanism for running processes as non-root users inside the container and is not the same as "USER containers".
What's different is that from the host perspective those processes are entirely unprivileged and have no access to the host's root
UID despite everything looking normal within the running container. This limits the impact to the host of any potential privilege escalation bugs within the container, such as last week's sudo CVE.
The thing here that's breaking is how SUID works when the container is built on a hardened build server then run in the hosts own user namespace.
Yes, I understand. What I don't understand is that how remapping the file ownerships, from the point of view of the container, is going to help in any way. If files belong to root, they are more restricted than if they belong to some normal user, so it's normally a good thing! Except for suid executables, like here, where remapping it to a normal user breaks everything, when keeping it owned by root would not have been dangerous since the root privilege gained is only inside the container and by definition cannot leak to the host.
In other words: I don't understand how --userns host
makes any kind of sense.
If, as you say, pid 1 is still starting as root inside the container's user namespace, and it's only the files that have remapped ownership, then it's probably fixable at the cost of yet another workaround, but I do question the validity of the operation in the first place.
I have modified s6-overlay-suexec
so it should do the right thing even in the case of --userns host
. Closing this; please reopen if it's still not working for you once 3.1.0.0 is out, or earlier if you happen to build from the latest source.
@skarnet Where can i find more information about this comment, especially "services drop their root privileges in the run script". Lets say, in my container I've a user "abc" and would like to perform some init stuffs as "root" and run the main service as user "abc" with limited privileges. Also does the user will also have all the capabilities of the container?
Normally, in a multiprocess container, the container is run with its entrypoint running as root, then the init system performs privilege separation and runs every subservice under a different uid. That's how s6 operates traditionally; that's what s6-overlay tries to recreate; s6 is secure enough that the supervision tree can run as root, provided that services drop their root privileges in the run script. It works and nobody has ever complained about it.
If the command running your main service is foo
, then use s6-setuidgid abc foo
instead, and foo
will run as user abc
. Don't change anything else, so your init is still performed as root.
User abc
will keep having access to the whole container, except, obviously, what can only be done by root
. So, for instance, if it needs to write files to a directory dir
, you should ensure that dir
belongs to abc
beforehand.
The changes introduced in v3 have resulted in a regression when the docker daemon is running with the
userns-remap
option. This is similar to the conditions that were fixed in https://github.com/just-containers/s6-overlay/issues/309, however the failure is now in a different place.When running dockerd with
userns-remap
, then starting up a container with--userns=host
, I get the following error message:With the docker daemon running with
userns-remap
, the container build is run in a user namespace that differs from the host. Files created during the build are owned by the re-mapped IDs.When those images are run in the remapped user namespace, the result is they are mapped correctly and are owned by
root
.However if those containers are run with
--userns host
we see the following:Note the UID & GID should be
0
orroot
, but they are the remapped UID/GID as configured in/etc/subuid
and/etc/subgid
as shown below.Relevant config files
Example
/etc/docker/daemon.json
:And the remap files:
The
Dockerfile
(done in a RUN segment so the latest release files can be retrieved, unpacked, and sha256 verified in a single layer):Commands
The build command:
The run command: