hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.89k stars 1.95k forks source link

HCL2: Fetch data from consul, vault, etc #9434

Open notnoop opened 3 years ago

notnoop commented 3 years ago

Consider adding job spec HCL functions for fetching data from vault/consul and other sources. Packer categorizes these as Contextual Functions.

Design Considerations

Nomad currently support fetching data from Vault and Consul via Consul Template. When adding contextual HCL functions, they should co-exist nicely with that support - operators must have clear expectations of the behavior of each and don't get surprised when they switch between them.

We have two open questions for supporting contextual functions.

Evaluation Context

Contextual variables present a design challenge: How should they be evaluated?

Current HCL2 functions are evaluated statically by the CLI on the job submitter host. By implementing vault/consul functions as a normal HCL function, the CLI must be configured to access the production Vault/Consul. The CLI must reach the production Vault/Consul cluster, and the operator must have direct read access to production secrets. Such production access may not be ideal.

An alternative is to have Nomad servers fetch Vault/Consul values on behalf of the job. This is consistent with the current Vault/Consul integration with consul templates, and eases the deployment flow.

Leases and Refresh

The documentation must set clear expectations on refresh behavior. Current, Consul Template integration have watch semantics, so a task may be notified when a vault/consul value changes. Also, Consul Template integration now refreshes the lease of the Vault secret.

As HCL functions are evaluated statically, a simple contextual function will only be evaluated once at submission time. Operators shouldn't expect the jobs to be updated when the vault/consul source data is changed. Also, they should expect the jobs to potentially run past the lease expiry of the fetched secrets, potentially causing a service outage.

bogue1979 commented 3 years ago

Especially for vault secrets having an additional option to fetch them once during deployment would bring the benefit of using nomad without direct vault integration. I don't know how it looks like in other companies but in our case, most of our secrets are not as dynamic to require the watch semantics and updating secrets in vault is done together with a deployment. The documentation should make a clear distinction between the vault and consul function to template files into the allocation directory which can be coupled with a notification when changed and an HCL2 vault/consul function which is evaluated once. Maybe they should even have different names?

To describe our workflow and the reasoning of https://github.com/hashicorp/nomad/pull/9423: We template our nomad job files with an external tool including secrets from vault. Within vault we permit the read and write access of secrets with vault policies assigned to the given vault token. The CI/CD pipeline has access to production secrets during templating. For various reasons our nomad clusters have no direct access to the central vault cluster. Finally when we could use a function within hcl2 to read secrets from vault, we can get rid of the external templating tool.

the-maldridge commented 3 years ago

This sounds like a bit of a security nightmare since Nomad doesn't protect the jobspec as sensitive data. I don't see a way to implement this without significant structural changes to the Nomad security model to accommodate making the jobspec secret.

bogue1979 commented 3 years ago

This sounds like a bit of a security nightmare since Nomad doesn't protect the jobspec as sensitive data. I don't see a way to implement this without significant structural changes to the Nomad security model to accommodate making the jobspec secret.

This is probably true as long as you have access to the running jobspec you can get the substituted values. But this is the case with every other templating solution as well. I have not tested the full capabilities of Nomads ACL system but list-jobs vs read-jobs looks promising.
Another option which comes into my mind is how terraform offers the option for outputs to be sensitive.