Render templates and download artifacts in containers

schmichael commented 2 years ago

aka switch template from being trusted to untrusted.

Proposal

Nomad should fetch artifacts and render templates inside their task's container so that they have the same permissions, capabilities, namespaces, and filesystem layout as their task.

Context

As of Nomad 1.2 the artifact and template stanzas map to the go-getter and consul-template tools respectively. These tools are used as libraries: embedded in the Nomad binary and their code is executed with the Nomad agent's permissions and capabilities.

Since go-getter and consul-template were designed as standalone tools and for use in CLI apps, it has been quite difficult to run them securely.

Complications

The primary complication here is that task drivers were not designed with a Prestart phase in which containerization has been setup but the task itself has not yet started.

It is likely the implementation should work more like logmon than our Consul Connect integration's Envoy sidecar proxy. The Envoy sidecar proxy is implemented as a task that is generated from other stanzas at the group level. The Envoy task uses the Docker driver and has to allow for extensive customization by job and cluster operators.

Logmon on the other hand is built into Nomad and fork/exec'd before starting tasks. It's not a "real" task and therefore is (largely) task driver agnostic. We could reuse the libcontainer containerization features of our exec driver to execute a new consul-template entrypoint safely. The downsides are that there's a lot of useful plumbing for tasks (logging, restarts, observability, etc) that we wouldn't get for free.

Consul Token

Right now templates use a per-allocation Vault token for communicating with Vault. However for Consul the Nomad agent's Consul token is used. We would not want to put the agent's Consul token in a container with a now-untrusted consul-template, so we would have to use a per-allocation Consul token.

Another alternative would be to proxy Consul requests through Nomad where the agent's Consul token could be safely injected.

Non-Linux Operating Systems

If we go the logmon route we currently only support containerization on Linux. We would either have to ship unsafe templating for other operating systems, or implement containerization.

Container Implementation

chroot/pivot_root are the most important (and hopefully available on non-Linux OSes) to prevent arbitrary filesystem access.
Executing as an unprivileged user should remove another class of security issues and being relatively easily achievable across OSes.
Resource constraints (cgroups on Linux) would be nice, but not a strict requirement. Since artifacts and templates are controlled by the job operator and not the task itself, we have looser needs here. Generally speaking Nomad is difficult to make safe from DOS from job operators. Nomad does intend to protect operators and other allocations from malicious or hijacked tasks though, which I think we can accomplish for templates without cgroups.
Namespaces are similar to cgroups in that they don't seem like a strict requirement for migrating to containerized/untrusted templates. One exciting opportunity would be to run templates without networking and rely on unix sockets connected to the Nomad agent to mediate communication with Nomad/Consul/Vault. Unsure if this would break other uses of templates, but it would also ease configuring TLS for the untrusted consul-template as it could communicate over plaintext to Nomad.

schmichael commented 2 years ago

Just noticed this is technically a dupe of #2510, but we'll continue to use this issue since stalebot closed the original years ago.

tgross commented 2 years ago

Another challenge here is limiting memory overhead. We know that the logmon process is using up an unfortunate amount of memory and having multiple template runners would be additional Nomad-owned RAM unusable by workloads. (This is less of a problem for artifacts because artifact fetching is one-and-done whereas templates remain running.) Some thoughts I've had on that:

Can we safely run a single CT-wrapper process per node and hand it templates to render to the allocation directories, instead of one per task?
I think the text segment will be shared if we don't put runners into different memory cgroups?
Can we reduce the memory overhead of go-plugin? It wasn't originally designed for the use cases Nomad is putting it under, but maybe there's some wins here that'd benefit logmon as well.

tgross commented 8 months ago

Some follow-up on this. We've sandboxed go-getter on Linux in a subprocess via Landlock with solves a lot of the problems this issue was intended to resolve without having to deal with the chicken-and-the-egg of having the artifact run before the container it downloads.

But template runners remain a challenge, particularly because they need to be long-lived and can potentially have unbound resource usage via large numbers of dependencies. This was the source of the issue I was investigating in https://github.com/hashicorp/nomad/pull/20134. It might be interesting to consider whether we could redesign template blocks to be more like the connect block -- automatically create a sidecar task that requires resource allocation. This would be tricky to do without a universal exec driver though.

hashicorp / nomad