hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.83k stars 1.95k forks source link

[feature]Restrict service registration by ACL or enable pass consul token in service stanza #6150

Closed tantra35 closed 11 months ago

tantra35 commented 5 years ago

Now service stanza allow any who can submit jobs to register any service, which may be insecure

So it will be cool and simple to implement to pass consul token in service stanza, that allow to restrict this operation

cgbaker commented 5 years ago

indeed. this is an enhancement that we are tracking alongside the consul connect work.

the-maldridge commented 3 years ago

Just had to refactor some services since this isn't possible yet. Use case was to avoid needing to redeploy a container that's meant to serve a fallback page and pull its few lines of text from consul.

nvx commented 3 years ago

As per the Consul docs: https://learn.hashicorp.com/tutorials/consul/service-mesh-production-checklist#configure-acls

Each service must have a unique ACL token that is restricted to service:write only for the named service. You can review the Securing Consul with ACLs tutorial for a service token example. Note, it is best practices for each instance to get a unique token as described below.

From what I can gather this currently isn't possible to achieve with Nomad until this issue (and ideally #2214 since embedding tokens in the job file seems like a bad idea) are resolved, the implication of which is that Consul Connect under Nomad is not production ready (which is surprising as the Nomad docs don't seem to indicate this), or am I missing something?

tgross commented 3 years ago

From what I can gather this currently isn't possible to achieve with Nomad until this issue (and ideally #2214 since embedding tokens in the job file seems like a bad idea) are resolved the implication of which is that Consul Connect under Nomad is not production ready (which is surprising as the Nomad docs don't seem to indicate this), or am I missing something?

Nomad derives a SI token for each running job. You can also set unique per-job ACL tokens when submitting the job, as per the nomad job run docs:

The run command will set the consul_token of the job based on the following precedence, going from highest to lowest: the -consul-token flag, the $CONSUL_HTTP_TOKEN environment variable and finally the value in the job file.

But not yet per-service, which would be this issue. If you're interested in having this issue resolved, we do keep an eye on "reactions" when we're looking to roadmap feature requests. This one isn't currently slated for the next minor release cycle. But we'd be happy to review a PR if you were open to giving it a go.

the implication of which is that Consul Connect under Nomad is not production ready (which is surprising as the Nomad docs don't seem to indicate this), or am I missing something?

Nomad has a security model which might not overlap 100% with a close reading of the Consul docs, as Consul can't assume it's operating in a Nomad environment. For example, the recent Consul transparent proxy feature is papering over a bunch of gaps which Nomad already covers with network namespaces. But also generally speaking "production ready" is like saying "secure"/"insecure" -- without the context of the organization, threat model, resources, etc., it's just unhelpful editorialization. If you have specific context you'd be willing to share I'm sure folks can help you figure out what the right deployment scenario will work for you and your organization. I'd suggest opening a topic on Discuss.

nvx commented 3 years ago

Nomad derives a SI token for each running job.

It does? Looking at the Nomad docs eg at https://www.nomadproject.io/docs/integrations/consul-connect doesn't mention this behaviour so my assumption was that all jobs using Consul Connect just ran with the same Consul Token as the Nomad Client uses (unless otherwise specified when submitting the job of course).

If multiple services within the same job are all using the same token, but different jobs use different tokens this is a much better situation than I thought was the case from looking at the docs.

Just to confirm, if the Nomad servers/clients are configured with a consul token with write access to say Consul services service1 and service2, and there is a separate Nomad job for each service, could a malicious service1 job (eg the service has a vulnerability resulting in remote code execution inside the alloc from a remote attacker, not that the operator submitting the job is malicious) impersonate service2 (assuming that the service1 job file only mentions service1 service), or is this prevented by Nomad deriving a SI token for each job? If it's already prevented then this ticks off my requirements without needing this or #9607 (I meant 9607 in my earlier comment in place of 2214).

It's kinda hard at the moment working out where the Consul Connect security controls are in Nomad when some of the Nomad docs just link off to the Consul docs which are not written with Nomad specifically in mind. Perhaps extending the Consul Connect integrations doc linked earlier would help alleviate confusion (the Vault integrations page for example has a lot more detail on how Vault obtains tokens from a role/etc for each alloc)

tgross commented 3 years ago

It does? Looking at the Nomad docs eg at https://www.nomadproject.io/docs/integrations/consul-connect doesn't mention this behaviour so my assumption was that all jobs using Consul Connect just ran with the same Consul Token as the Nomad Client uses (unless otherwise specified when submitting the job of course).

That could be better documented, for sure.

If multiple services within the same job are all using the same token, but different jobs use different tokens this is a much better situation than I thought was the case from looking at the docs.

I checked in with some of my colleagues and realized the situation is actually slightly better than I said here. The SI token is derived per task (ref sids_hook.go).

Just to confirm, if the Nomad servers/clients are configured with a consul token with write access to say Consul services service1 and service2, and there is a separate Nomad job for each service, could a malicious service1 job (eg the service has a vulnerability resulting in remote code execution inside the alloc from a remote attacker, not that the operator submitting the job is malicious) impersonate service2 (assuming that the service1 job file only mentions service1 service), or is this prevented by Nomad deriving a SI token for each job?

A few notes here:

Of course, keep in mind that if a compromised task were to escape its isolation (ex. a Linux container breakout or kernel vuln), it could compromise the Nomad client token and therefore impersonate any service that client can derive tokens for. We have a lot of defense in-depth to prevent this scenario, but an unconstrained process running on the client host is not part of the Nomad threat model.

It's kinda hard at the moment working out where the Consul Connect security controls are in Nomad when some of the Nomad docs just link off to the Consul docs which are not written with Nomad specifically in mind. Perhaps extending the Consul Connect integrations doc linked earlier would help alleviate confusion (the Vault integrations page for example has a lot more detail on how Vault obtains tokens from a role/etc for each alloc)

Agreed. The Consul project has moved pretty rapidly and so we've been playing catch-up a bit here and "cheating" at docs here by just linking out. But it's definitely time to tighten those docs up.

nvx commented 3 years ago

That makes sense. I'd been hesitant to roll out Consul Connect due to mistakenly held beliefs that there was still work needed to be done from a security perspective to properly roll it out but sounds like it's really just the docs.

Obviously a task escaping isolation is game over for pretty much all Nomad security boundaries so I'm not too concerned about that. Trusted operators and trust in the isolation boundaries established by Docker is fine, but the code running inside the allocs not having to be trusted is the real importance to me, especially when running docker containers from third parties (eg databases and the like).

Great to hear it's per-task even, multiple services in the same task I can't imagine how any security boundary could be expected to exist between them since they could just read out any secrets from the task secret directory anyway.

tgross commented 11 months ago

Starting in Nomad 1.7.0-beta.1 we've deprecated the use of Consul tokens in the Nomad agent configuration for purposes of giving workloads access to Consul. Nomad will use workload identities to sign into Consul for purposes of getting Consul tokens for those workloads, and those tokens can be downscoped in privilege based on the auth method, role, and binding rules configured in Consul (typically based on the Nomad namespace).