hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.76k stars 1.94k forks source link

Workload identity : lack of usable user_claim when using Nomad namespaces and Vault entities #23510

Closed the-nando closed 1 month ago

the-nando commented 2 months ago

I'm working on migrating some clusters from the legacy Vault token based integration to the new workload identity based one.

My aim is to be able to create a single Vault entity per workload, set entity specific policies and use that in addition to the generic role's token policy.

The tutorial suggests to use "user_claim": "/nomad_job_id" and a templated Vault policy utilising the claim mapped metadata, something along the lines of:

path "secrets/data/{{identity.entity.aliases.AUTH_METHOD_ACCESSOR.metadata.nomad_namespace}}/{{identity.entity.aliases.AUTH_METHOD_ACCESSOR.metadata.nomad_job_id}}" {
  capabilities = ["read"]
}

To cater for jobs which may require additional ad-hoc policies, I want to pre-create Vault identities for workloads that will have one or more additional identity policies.
To get this to work I would use an entity-alias based on the user_claim to map it to that entity. This would allow me to setup a default token workload policy, like in the tutorial, with templated paths and for any exception I could just create a policy with the same name as the one we assign to the entity.

The problem is that the user_claim isn't unique when one uses /nomad_job_id in combination with Nomad namespace as the Job ID isn't unique within a Nomad cluster.
The implication on the Vault side is that any job by the same name will get assigned the same implied identity which is a potential security risk and that could lead to unintended access to Vault resources.

A workaround is to create a Vault JWT role per workload and configure bound_claims:

"bound_claims": {
  "nomad_namespace": "myns",
  "nomad_job_id": "myjob"
}

But this invalidates completely the features of Vault entity management. Furthermore, to my knowledge, a JWT user claim must be unique within the system. It would be perhaps better to recommend users to use "user_claim": "/sub" if they don't intend to use bound_claims.

What I would like, is to be able to use a unique claim, something like nomad_workload_id: "<namespace>:::<job_id>" which can then be leverage on the Vault side to configure entities and aliases accordingly. "/sub" wouldn't work as it contains additional details, like region/taskgroup/task/identity, which are something Vault operator may not know upfront for each job. Can such user_claim be made available?

tgross commented 1 month ago

Hi @the-nando! So just to summarize the problem as you see it here, it's not that bound claims don't work, but that the user_claim on the Vault side can't be composed from multiple fields (i.e. Nomad namespace + Nomad job ID)?

(For what it's worth, I suspect our intent here is that there's a 1:1 mapping between Nomad namespace and Vault namespace, but I realize that's not always going to be feasible. Especially because Nomad namespaces are in CE and Vault namespaces are in ENT.)

the-nando commented 1 month ago

Hey @tgross 👋 Bound claims works as intended but user_claim doesn't allow me to easily and uniquely identify a (job,namespace) on the Vault side without resorting to /sub which carries more information making its use impractical from a Vault operator point of view (identity aliases, etc.).

It would also be worth adding a note in the tutorial mentioning the possible implications of using /job_id in combinations with namespaces or, perhaps, suggest to use /sub instead.

tgross commented 1 month ago

Ok thanks @the-nando. I'll get this surfaced for roadmapping.

tgross commented 1 month ago

I've got a draft PR up here https://github.com/hashicorp/nomad/pull/23675. The implementation is easy, but I want to do some testing with Vault to make sure it's getting us what we want so that'll need some E2E testing.

schmichael commented 1 month ago

without resorting to /sub which carries more information making its use impractical from a Vault operator point of view (identity aliases, etc.)

Hi @the-nando, I was wondering if you could elaborate on this. I can understand that agreeing upon a fully qualified name ahead of time might be a hassle, but I'm worried about the security implications of relying purely on <namespace>:<job> as multiple regions may have overlapping namespaces and job names (especially in circumstances where there are dev/staging/prod clusters; I'd hate for a misconfiguration to end up granting prod Vault access to dev region Jobs).

If we do add a new field would it make sense to make it <region>:<namespace>:<job> to "fully" namespace the identity from Nomad's perspective?

Out of curiosity would #19438 (custom claims) also address this? It would not prevent multiple jobs from sharing a value, but perhaps there's no concern with job submitters being able to do that.

If custom claims would address your use case, I have a slight preference for it since it seems very difficult to articulate to users when to use sub vs nomad_job_id vs the new nomad_workload_id. Any claim we add to Nomad also has to live more or less forever, so I'd like to be very confident and conservative in what we hardcode.

tgross commented 1 month ago

Out of curiosity would #19438 (custom claims) also address this? It would not prevent multiple jobs from sharing a value, but perhaps there's no concern with job submitters being able to do that.

Custom claims as described in #19438 could totally solve it but they make the ergonomics for job authors not very nice, as now the job author is responsible for describing the claim for all their jobs. Maybe not bad for "this one job needs it" but if there was a case where many many jobs need third-party auth that needs a claim like $region:$namespace:$job it becomes painful for authors.

However, along those lines what if we made this a server configuration? Ex. cluster administrators could specify extra claims in their vault.$cluster_identity block or a new server.identity block. Then the extra claims would be applied to all identities signed without the job author getting involved. It'd need to have some kind of templating over the allocation/job. Something like this:

server {
  identity {
    extra_claims = {
      "example"  = "${region}:${namespace}:${id}"
      "whatever" = "${region}:${namespace}:${id}"
    }
  }
}

vault {
  default_identity {
    aud = ["vault.io"]
    ttl = 1h
    extra_claims = {
      "example"  = "${region}:${namespace}:${id}"
      "whatever" = "${region}:${namespace}:${id}"
    }
  }
}

If we did this, we could allow job authors to have identity.extra_claims blocks too so they can override the default. But that lets job authors have their jobs masquerade as other jobs. Which sounds bad?

the-nando commented 1 month ago

Hi @schmichael /sub includes region/taskgroups/task/identity which is something often not known by Vault operators upfront for a given job and makes pre-provisioning Vault identity-aliases cumbersome. I'm basically after the simplest unique (within a federated Nomad cluster) user_claim which would allow me to identify a given Nomad job in Vault for the purpose of provisioning entities and entity aliases in a similar manner to what @tgross did for the E2E test in https://github.com/hashicorp/nomad/pull/23675.

but I'm worried about the security implications of relying purely on : as multiple regions may have overlapping namespaces and job names (especially in circumstances where there are dev/staging/prod clusters; I'd hate for a misconfiguration to end up granting prod Vault access to dev region Jobs).

I do have overlapping namespaces and job names across clusters but they are connected to different Vault clusters. Within a single cluster I treat, as far as Vault access is concerned, all (namespace,job) the same regardless of which region they run into. Prefixing the claim by region would require additional entity-aliases but it's something I can live with.

Out of curiosity would https://github.com/hashicorp/nomad/issues/19438 (custom claims) also address this? It would not prevent multiple jobs from sharing a value, but perhaps there's no concern with job submitters being able to do that.

@tgross thanks for the input on the custom claims, your answer sums up my point of view as well. A generic solution for custom claims is more versatile and welcome, as long as that be can be controlled at server's configuration level as well. Introducing changes to job specs is often a non-trivial exercise when running hundreds of them deployed by different teams. IMHO being able to configure identity claims in the job spec is a security hazard when coupled with the Vault integration and templated policies. I understand the point being discussed in #19438 in reference to how a job used to be able to pass arbitrary policies but that doesn't make it less of a potential problem. In my setup I already use Sentinel to control which policies a given job can specify and I can easily extend that to forbid configuring extra_claims at job level.

tgross commented 1 month ago

Ok, so @schmichael and I had a chat and I think we've settled on the idea of introducing a extra claims block that accepts template strings in the server configuration. So in the Vault block you'll do something like this:

vault {
  address = "https://vault.example.com:8200"
  enabled = true

  default_identity {
    aud = ["vault.io"]
    ttl = "1h"
    extra_claims {
      nomad_workload_id = "${job.namespace}:${job.id}"
      some_other_claim  = "foo"
    }
  }
}

We'll need to do a little investigation to see the exact objects we can expose in those templates, but that's the gist of things.

This allows us to avoid adding lots more claims to the JWT that some users might not need, while giving cluster admins the flexibility they need to meet their requirements for controls. We'll also probably want to add the same feature for a top-level server.default_identity, but we can do that in follow-up work. That'll cover a lot of the remaining use cases described in https://github.com/hashicorp/nomad/issues/19438.

tgross commented 1 month ago

23675 has been merged and will ship in the upcoming Nomad 1.8.3 (with backports to Nomad Enterprise 1.7.x and 1.6.x)

the-nando commented 1 month ago

Thanks a LOT @tgross!