picatz / terraform-google-nomad

📗 Terraform Module for Nomad clusters with Consul on GCP
https://registry.terraform.io/modules/picatz/nomad/google
MIT License
78 stars 16 forks source link

Deploy Consul with Nomad #18

Closed picatz closed 3 years ago

picatz commented 3 years ago

Aims to fix #15

picatz commented 3 years ago

Before merging this, I wanted to document some metadata service abuse opportunities from this current cluster configuration, and a manual work-around.

Sadly, many of the initial VM configuration and secret tokens are dynamically provisioned using the metadata service. This makes things really easy, but, this leaves some use cases vulnerable to internal abuse. This is similar a security concern to GKE, a common problem for cloud infrastructure in general.

From GCP's metadata documentation:

Caution: Any process that can query the metadata URL, has access to all values in the metadata server. This includes any custom metadata values that you write to the server. Google recommends that you exercise caution when writing sensitive values to the metadata server or when running third-party processes.

Simulating a Compromised Container

To help better illustrate what that might look like, I can simulate a compromised container workload, using a Nomad ACL token with exec permissions to get a shell on a client container.

First, I'll export a privileged management token. I also have a proxy listening on localhost:4646 that is performing mTLS termination through an SSH bastion to make the Nomad server available over HTTP on my laptop:

$ export NOMAD_TOKEN="4fa711e6-37af-ea3c-f140-126fddba9592"
$ export NOMAD_ADDR="http://127.0.0.1:4646"

Then I deployed a Consul-connect enabled job:

$ nomad run jobs/count-dashboard.hcl
...

And then started a shell in one of the deployed containers of a Nomad client, in this case running the "web" task:

$ nomad alloc exec -task="web" 57c5dee6 /bin/sh
👇🏽 In the container
$ curl -H "Metadata-Flavor: Google" http://metadata/computeMetadata/v1/instance/attributes/startup-script
 ...
CONSUL TOKENS
NOMAD TOKENS
ANYTHING ELSE INJECTED AS A STARTUP SCRIPT
...

This shell can also connect to any of the server nodes, and further escalate privileges. This kind of attack requires a successful combination of the following:

This would be unlikely to happen from an unprivileged Nomad operator (without exec access, or a management ACL token). Otherwise, the container itself would need to be compromised outside of Nomad's control.

Removing Metadata After Initialization

A manual step administrators could take to harden their deployements could be to delete the metadata service contents when it's deployed and successfully initialized. For example, to remove all of the metadata information from client-0:

$ gcloud compute instances remove-metadata client-0 --all

Removing Secrets from Metadata

Instead of removing the secrets from metadata, it would be better if they weren't there in the first place. Operators shouldn't need to worry about that attack vector when using this module. I know I don't want to worry about it when deploying my clusters. 😸

Using Vault in the Future

To actually handle all of these secret management concerns, it would probably be a good idea to introduce a Vault deployment to this module in the near future.