hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.87k stars 1.95k forks source link

Setting cap_add all in nomad exec task, causes unrecoverable error #19059

Closed 116davinder closed 11 months ago

116davinder commented 11 months ago

Nomad version

1.6.3

Operating system and Environment details

Ubuntu 20.04

Issue

Nomad Task fails with unrecoverable error when cap_add = ["all"] set.

Reproduction steps

      config {
        command = "/usr/sbin/kpropd"
        args = [
          "-D",
          "-a", "${NOMAD_TASK_DIR}/kpropd.acl",
          "-P", "754",
          "-f", "/var/lib/krb5kdc/from_master",
          "-s", "/var/lib/krb5kdc/krb5.keytab"
        ]
        cap_add = ["all"]
      }

Expected Result

Exec driver including all valid capabilities to be included.

Actual Result

Pasted Graphic 2

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

shoenig commented 11 months ago

Hi @116davinder if you want to use all capabilities in a task, then the task driver must first be configured to enable adding all capabilities.

https://developer.hashicorp.com/nomad/docs/drivers/exec#cap_add

Effective capabilities (computed from cap_add and cap_drop) must be a subset of the allowed capabilities configured with allow_caps.

https://developer.hashicorp.com/nomad/docs/drivers/exec#allow_caps

116davinder commented 11 months ago

@shoenig, My assumption is that i want to use all default caps to be used in my task [ Not ideal case but required for testing purpose ]

Isn't nomad should calculate or decide, what will be considered when all is set based on task driver instead of bindly passing all caps to driver and failing with above error.

116davinder commented 11 months ago

If nomad can't do this, then Either docs should be updated to have proper category stating like i mentioned below or CLI / API should do these validations.

Example:

  1. MacOS - [ list of all possible caps at the time of writing docs + what is enabled by default ]
  2. Linux - [ list of all possible caps at the time of writing docs + what is enabled by default ]
  3. Windows - [ list of all possible caps at the time of writing docs + what is enabled by default ]
  4. Docker / ContainerD - [ list of all possible caps at the time of writing docs + what is enabled by default ]
  5. etc.
tgross commented 11 months ago

https://developer.hashicorp.com/nomad/docs/drivers/exec#allow_caps says at end that all is supported option, it doesn't mention that only supported in cap_drop.

@116davinder that allow_caps field is in the plugin configuration section of that doc, not the task configuration section of the doc. I can add a note to the docs mentioning that it needs to be a list of specific capabilities for the allow_caps field (and therefore "all" isn't permitted if all capabilities aren't permitted).

  1. MacOS - [ list of all possible caps at the time of writing docs + what is enabled by default ] ...
  2. Windows - [ list of all possible caps at the time of writing docs + what is enabled by default ]

I'm not sure what you're getting at with that. Mac and Windows don't support Linux capabilities at all, but it doesn't really matter here either because the exec driver only supports Linux.

116davinder commented 11 months ago

I'm not sure what you're getting at with that. Mac and Windows don't support Linux capabilities at all, but it doesn't really matter here either because the exec driver only supports Linux.

@tgross, I am looking that either

  1. docs should say explicity that cap_add = ["all"] is not support unless allow_caps = [....] have all the possible options mentioned or
  2. When I put cap_add = ["all"] in job spec then default list of caps should be automatically added to the task so that I have put this whole list again in the job spec hcl file. Untitled

**Note***, i do agree that putting all is a security risk but that doesn't mean that new development/debugging should be pain to figure out which caps are required by task or not.

tgross commented 11 months ago
  1. docs should say explicity that cap_add = ["all"] is not support unless allow_caps = [....] have all the possible options mentioned or
  2. When I put cap_add = ["all"] in job spec then default list of caps should be automatically added to the task so that I have put this whole list again in the job spec hcl file.

Ok, cool. https://github.com/hashicorp/nomad/pull/19091 implements (1). The second option would also have been a totally reasonable way to have done it originally, but at this point I'd hesitate to change it because it'd change how the capability set is calculated for existing jobs. Allocations where the task has cap_add = ["all"] and the plugin has alloc_caps = ["all"] would potentially fail if they were rescheduled on a different node with a different allow_caps plugin config, and it they would fail silently as far as Nomad knows -- the task itself would get killed when trying to access to now-forbidden capabilities.