Open endocrimes opened 5 years ago
@endocrimes Such a thing would be really great. Where/how can the community help to move this forward?
@apollo13 I left HashiCorp last year so unfortunately I'm not sure about the current state of thinking here (and no longer have my notes on this).
@schmichael should be able to be a little more helpful than me though :)
@schmichael Any chance of getting some feedback on this? If the suggested approach looks fine, I might be able to start working on a PR.
Now that #11807 is merged; can we start thinking about this one @tgross? Maybe you already had some ideas internally and if the nice folks from hashicorp could lay out their plan here I might be able to provide some code :)
Hi @apollo13! I'm happy to get the conversation rolling... I think this is decidedly trickier from an architectural standpoint so I'd love to get your thoughts here.
In theory could simply add a new Capabilities
field to the driver API for "supports privileged execution" but that has a two major gotchas:
Gotcha 1
The major architectural barrier is that the config
block for a task driver can't currently be introspected by the server following the Nomad 0.9 change that allowed for third-party task drivers. As you can imagine, this is frustrating for lots and lots of feature requests. Even just being able to say "hey this config is valid" at job registration time instead of after placement would be great, but we can't do that today.
What it comes down to is that there can be more than one version of a task driver on the cluster. Suppose I have nomad-driver-podman
v0.3 on some clients and a hypothetical nomad-driver-podman
v0.4 that supports this new capability. We don't decide which client gets an allocation until we've scheduled the allocation, by which time it's too late to do ACL checking for that capability!
So in #11807 we were able to take advantage of the task driver name being a task
-level configuration item. The name is just a string and all task drivers with the same name are treated identically. The privileged execution concept is specific to a task driver, and the server doesn't "look inside" the config
for task drivers because task drivers don't even provide their schemas (much less semantics) in their fingerprints to the server.
Gotcha 2
This is less of a blocker and more a design space to explore. The concept of "privileged" isn't really a boolean! The docker
driver has a privileged
boolean flag but that just sets a specific set of configuration flags that you can with a combination of security_opt
, userns_mode
, devices
, cgroup_permissions
, cap_add
, etc. And that's a Linux-specific model of privileges that doesn't have 1:1 mapping with Windows, macOS, or other Unixen. So we'd have to model both the OS-specific privileges and how a specific task driver's API interacts with those privileges. Lots of design space to cover here.
Hi @tgross, I fully agree that this one will not be that easy. I do not think that it makes sense to expose all the options you listed in gotcha 2 as capabilities.
I also do not have a good idea on how to solve gotcha 1; but I do have an idea. What if it were possible to configure drivers like this (client.config):
plugin "docker" {
# backend = "docker" matches the plugin name from above
config {
volumes {
enabled = false
}
allow_privileged = false
}
}
plugin "docker-privileged" {
backend = "docker" # Needs to be explicitly set here since docker-privileged does not exist as a plugin
config {
volumes {
enabled = true
}
allow_privileged = true
}
}
In the task one would then specify "docker-privileged" as driver name… This would allow us to push the granularity of what is allowed and whatnot into the task driver itself without the servers (or namespaces) needing to know more than a plugin name.
I am not sure how feasible this is yet codewise (ie dispatch a plugins twice with different configs) and it certainly would need a bit of code so that the plugins are properly fingerprinted but aside from that it should be opaque (for the lack of a better word) for nomad.
Is that a direction worth pursuing?
I threw together https://github.com/sorenisanerd/nomad-docker-driver-external as a workaround. It's the internal docker driver, but as an external plugin. This lets you apply different configuration to the internal and external driver, and then you can restrict the one that allows privileged containers to specific namespaces.
Ha, that is a nice idea. Does this also work in conjunction with connect jobs etc?
I made sure to import the docker driver and commit it and then layered my changes on top: https://github.com/sorenisanerd/nomad-docker-driver-external/commit/9685fc87e5f1262be3ee7b39df3840a82ed4e918
As you can see, I haven't really changed anything, I just glued it together. Anything that works with the docker driver should work with this one. Just specify 'driver = "docker-ext"' in your job description.
Make sure to heed the advice in the README, and you should be good to do.
Note: This issue mostly contains initial background thoughts to prompt discussion and is not yet a well defined proposal.
Background
Currently, Nomads permission model around runtime permissions of a job exist only within the implementation of a driver. This means that we do not include them in our ACL system or take whether the features are enabled on a particular client into account in the scheduler.
This is fine if a Nomad cluster is a uniform fleet, but that is rarely the case in larger clusters, and currently requires users to add additional metadata to privileged clients, and constraints to jobs that require them. It also then allows anyone with access to the
submit-job
permission in any namespace to get privileged access to those hosts.As part of #5378 however, access to privileged containers will become more normal as CSI Plugins require privileged containers in order to be able to create new mounts in the system for publishing volumes. Although we operate on a trusted operator security model, there are many valid cases where CSI plugins may want to be deployed, without granting trivial docker privilege escalation to all.
Proposal
I'm proposing introducing a Nomad-level API for modelling process isolation levels. This change would introduce a
privileged
option at thetask
level in a Nomad Job, that would signal to Nomad that the job my only be placed on nodes where that driver exposes thePrivileged
execution capability. It would also allow the introduction of aprivileged-execution
capability to the ACL system.Task Configuration
Driver API
This is currently mostly undefined, but it would mostly involve updating the
DriverCapabilities
to introduce a new field that plugins may opt in to, and introducing a privileged option to TaskConfig.Opting nodes into privileged execution
Currently, Nomad requires you to configure drivers with support for privileged execution modes in a client configuration. After this change, you'll still be required to enable support on an individual client, but by default will require using the new configuration for privileged execution modes.
To allow for backwards compatibility and a cleaner upgrade path, we will also offer an option in driver configuration to retain the existing behavior for using privileged execution environments.
Docker
Example Config
Raw Exec
The Raw Exec driver will begin exposing an
Unconstrained
isolation capability whenlegacy_privileged_behavior
isfalse
which will require that a user has access toprivileged
execution modes.