Open dposton80 opened 3 years ago
@dposton80 running Nomad clients in a container isn't really a supported configuration. Nomad clients must be run as root because they require CAP_SYSADMIN
and (usually) CAP_NET_ADMIN
. So if you're running Nomad in Docker it'll need to run with --privileged
at a minimum, and probably without userns isolation on dockerd
(although I haven't verified that).
Nomad clients don't require root to run in general - nomad would not be useful to us if they did (at least for raw_exec) as we don't have root access to our machines. There seems to be some intention towards this - see #2178 and #2070
It works fine if I set the USER correctly in the container (to a non-root, unprivileged user, that has uid != 0), and run the container without --privileged. So why not make it work for the uid = 0 case as well... looks like a small fix to line 47 of fs_linux.go?
Nomad documents pretty clearly that we expect clients to be running as root. Many features of Nomad clients won't work if the client doesn't have the appropriate capabilities, and the matrix of each feature being assessed as "impossible to support" vs "can fallback to reduced functionality" becomes a combinatorial explosion.
So why not make it work for the uid = 0 case as well... looks like a small fix to line 47 of fs_linux.go?
Hm, what would you propose? While you can do getpcaps
to get the processes own capabilities and check for cap_sys_admin
in addition to uid=0. But now we've introduced a fallback behavior that silently changes the security characteristics of the secrets directory.
Nomad with raw_exec is useful as a scheduler for regular processes from non-root accounts. Are you saying we should not continue to rely on this behaviour? I don't think I'm alone here e.g. https://serverascode.com/2016/10/18/distributed-cron-with-nomad.html
I would defer to your judgement on a possible change, it seems like it is already doing a silent fallback behaviour based on root/non-root so I wouldn't have thought getpcaps would be much different? Could the test be changed to be just getpcaps rather than uid == 0? Is there currently a risk of running as a privileged non-root user and secrets unexpectedly not going into a tmpfs?
Are you saying we should not continue to rely on this behaviour? I don't think I'm alone here
I don't think we'd intentionally break that behavior. But the use case for raw_exec
with no network configuration is a fairly narrow range of features and the moment someone steps outside of that range Nomad won't work.
I would defer to your judgement on a possible change, it seems like it is already doing a silent fallback behaviour based on root/non-root so I wouldn't have thought getpcaps would be much different? Could the test be changed to be just getpcaps rather than uid == 0? Is there currently a risk of running as a privileged non-root user and secrets unexpectedly not going into a tmpfs?
Apologies, I could have explained that more carefully -- I mean that we'd be changing the current behavior without it being obvious to the operator: previously we'd fail (as you've seen here) whereas now the job works but maybe not as expected.
I'm not in theory opposed to the notion of swapping a caps check for the UID check, but at a quick glance it looks like there are a dozen or so other places in the codebase where we have a check for UID=0 so doing it as a one-off just for fs_linux
seems like it would just push the problem to somewhere else. That being said, I also realize it would probably fix your immediate use case!
So I brought this issue up in an internal discussion and came away a general consensus with my colleagues about a path forward:
getpcaps
vs UID=0 checks, so that we can take each of those and we can incrementally improve the situation as time permits. ref https://github.com/hashicorp/nomad/issues/10051I'll re-open this issue and rename it a bit to serve as a pointer for that umbrella issue. Thanks for your patience here and I appreciate you walking me through the use case so carefully.
Thank you for looking into this. Also, thank you for an excellent product and for making your codebase a joy to debug. It's a great example of good software engineering.
Nomad version
0.12 (but checking code appears to be still present in latest versions)
Operating system and Environment details
docker, linux RHEL7
Issue
When running from a container that is built without a USER command, the container can run with uid 0 but without root privileges. This causes the raw_exec driver to fail with e.g.
[ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=eba12eb6-7b3a-670a-62a0-aeb559c27eea task=temp-chunkserver-task error="prestart hook "task_dir" failed: mount: operation not permitted"
The problem seems to be when it tries to create the secrets dir in fs_linux.go. It assumes that uid 0 will have the ability to perform the mount operation, but this is not true in my case
Could that code do a better test for root perms? Or just try the mount and fail back to creating the dir? Perhaps it should always just create the dir if fs isolation is none?
Reproduction steps
Due to my locked-down firm environment, I'm not able to freely run docker myself, so not 100% sure on what exact steps are to get a container to run as uid 0 but without root perms.
I'm not quite sure what causes the container to be run with uid 0 - it doesn't always seem to be the case (it worked OK in a kubernetes cluster, but not from a gitlab runner).
I realise that not having a USER command in a dockerfile is bad practice. But might be nice if nomad just worked anyway.