[RFC] Model priviliged execution environments in the task-driver API

endocrimes commented 5 years ago

Note: This issue mostly contains initial background thoughts to prompt discussion and is not yet a well defined proposal.

Background

Currently, Nomads permission model around runtime permissions of a job exist only within the implementation of a driver. This means that we do not include them in our ACL system or take whether the features are enabled on a particular client into account in the scheduler.

This is fine if a Nomad cluster is a uniform fleet, but that is rarely the case in larger clusters, and currently requires users to add additional metadata to privileged clients, and constraints to jobs that require them. It also then allows anyone with access to the submit-job permission in any namespace to get privileged access to those hosts.

As part of #5378 however, access to privileged containers will become more normal as CSI Plugins require privileged containers in order to be able to create new mounts in the system for publishing volumes. Although we operate on a trusted operator security model, there are many valid cases where CSI plugins may want to be deployed, without granting trivial docker privilege escalation to all.

Proposal

I'm proposing introducing a Nomad-level API for modelling process isolation levels. This change would introduce a privileged option at the task level in a Nomad Job, that would signal to Nomad that the job my only be placed on nodes where that driver exposes the Privileged execution capability. It would also allow the introduction of a privileged-execution capability to the ACL system.

Task Configuration

task "foo" {
  driver     = "docker"
  privileged = true
}

Driver API

This is currently mostly undefined, but it would mostly involve updating the DriverCapabilities to introduce a new field that plugins may opt in to, and introducing a privileged option to TaskConfig.

Opting nodes into privileged execution

Currently, Nomad requires you to configure drivers with support for privileged execution modes in a client configuration. After this change, you'll still be required to enable support on an individual client, but by default will require using the new configuration for privileged execution modes.

To allow for backwards compatibility and a cleaner upgrade path, we will also offer an option in driver configuration to retain the existing behavior for using privileged execution environments.

Docker

Example Config

plugin "docker" {
  config {
    allow_privileged           = true // existing field
    legacy_privileged_behavior = true // allow the existing behavior to be used, default: false
  }
}

Raw Exec

The Raw Exec driver will begin exposing an Unconstrained isolation capability when legacy_privileged_behavior is false which will require that a user has access to privileged execution modes.

plugin "docker" {
  config {
    enabled                = true // existing field
    legacy_privileged_behavior = true // allow the existing behavior to be used, default: false
  }
}

apollo13 commented 3 years ago

@endocrimes Such a thing would be really great. Where/how can the community help to move this forward?

endocrimes commented 3 years ago

@apollo13 I left HashiCorp last year so unfortunately I'm not sure about the current state of thinking here (and no longer have my notes on this).

@schmichael should be able to be a little more helpful than me though :)

apollo13 commented 3 years ago

@schmichael Any chance of getting some feedback on this? If the suggested approach looks fine, I might be able to start working on a PR.

apollo13 commented 2 years ago

Now that #11807 is merged; can we start thinking about this one @tgross? Maybe you already had some ideas internally and if the nice folks from hashicorp could lay out their plan here I might be able to provide some code :)

tgross commented 2 years ago

Hi @apollo13! I'm happy to get the conversation rolling... I think this is decidedly trickier from an architectural standpoint so I'd love to get your thoughts here.

In theory could simply add a new Capabilities field to the driver API for "supports privileged execution" but that has a two major gotchas:

Gotcha 1

The major architectural barrier is that the config block for a task driver can't currently be introspected by the server following the Nomad 0.9 change that allowed for third-party task drivers. As you can imagine, this is frustrating for lots and lots of feature requests. Even just being able to say "hey this config is valid" at job registration time instead of after placement would be great, but we can't do that today.

What it comes down to is that there can be more than one version of a task driver on the cluster. Suppose I have nomad-driver-podman v0.3 on some clients and a hypothetical nomad-driver-podman v0.4 that supports this new capability. We don't decide which client gets an allocation until we've scheduled the allocation, by which time it's too late to do ACL checking for that capability!

So in #11807 we were able to take advantage of the task driver name being a task-level configuration item. The name is just a string and all task drivers with the same name are treated identically. The privileged execution concept is specific to a task driver, and the server doesn't "look inside" the config for task drivers because task drivers don't even provide their schemas (much less semantics) in their fingerprints to the server.

Gotcha 2

This is less of a blocker and more a design space to explore. The concept of "privileged" isn't really a boolean! The docker driver has a privileged boolean flag but that just sets a specific set of configuration flags that you can with a combination of security_opt, userns_mode, devices, cgroup_permissions, cap_add, etc. And that's a Linux-specific model of privileges that doesn't have 1:1 mapping with Windows, macOS, or other Unixen. So we'd have to model both the OS-specific privileges and how a specific task driver's API interacts with those privileges. Lots of design space to cover here.

apollo13 commented 2 years ago

Hi @tgross, I fully agree that this one will not be that easy. I do not think that it makes sense to expose all the options you listed in gotcha 2 as capabilities.

I also do not have a good idea on how to solve gotcha 1; but I do have an idea. What if it were possible to configure drivers like this (client.config):

plugin "docker" {
    # backend = "docker" matches the plugin name from above
    config {
        volumes {
            enabled = false
        }
        allow_privileged = false
    }
}

plugin "docker-privileged" {
    backend = "docker" # Needs to be explicitly set here since docker-privileged does not exist as a plugin
    config {
        volumes {
            enabled = true
        }
        allow_privileged = true
    }
}

In the task one would then specify "docker-privileged" as driver name… This would allow us to push the granularity of what is allowed and whatnot into the task driver itself without the servers (or namespaces) needing to know more than a plugin name.

I am not sure how feasible this is yet codewise (ie dispatch a plugins twice with different configs) and it certainly would need a bit of code so that the plugins are properly fingerprinted but aside from that it should be opaque (for the lack of a better word) for nomad.

Is that a direction worth pursuing?

sorenisanerd commented 1 year ago

I threw together https://github.com/sorenisanerd/nomad-docker-driver-external as a workaround. It's the internal docker driver, but as an external plugin. This lets you apply different configuration to the internal and external driver, and then you can restrict the one that allows privileged containers to specific namespaces.

apollo13 commented 1 year ago

Ha, that is a nice idea. Does this also work in conjunction with connect jobs etc?

sorenisanerd commented 1 year ago

I made sure to import the docker driver and commit it and then layered my changes on top: https://github.com/sorenisanerd/nomad-docker-driver-external/commit/9685fc87e5f1262be3ee7b39df3840a82ed4e918

As you can see, I haven't really changed anything, I just glued it together. Anything that works with the docker driver should work with this one. Just specify 'driver = "docker-ext"' in your job description.

Make sure to heed the advice in the README, and you should be good to do.

hashicorp / nomad