hashicorp / consul-k8s

First-class support for Consul Service Mesh on Kubernetes
https://www.consul.io/docs/k8s
Mozilla Public License 2.0
669 stars 322 forks source link

helm:Document how to make calls to Consul API when TLS is enabled #674

Open lkysow opened 4 years ago

lkysow commented 4 years ago

See https://discuss.hashicorp.com/t/consul-verified-tls-from-pods-in-kubernetes-cluster/9208/11 for instructions. We should document this in our docs.

issacg commented 4 years ago

Is there any particular reason why the auto-encrypt ca certificate can't just be added to a secret, the same way that is done for consul-ca-cert?

This took me several days of ripping my hair out figuring out that there were 2 legitimate and parallel CAs to begin with, and then even once I accepted that this was by intention and not a bug, hours more to find this issue...

It seems to me that needing to make an init container is a bit overkill just to talk to your local consul agent.

ishustava commented 4 years ago

Hey @issacg,

Yes, I totally understand the confusion and sorry you had to go through that! Thanks so much for this feedback.

The reason we don't create a secret is that when the CA gets rotated or if the provider is changed, that secret will be out-of-date. This would then imply that we would need to run an additional process that checks for CA updates in Consul and then updates the secret in case that CA has changed. It seemed simpler to just fetch the CA in an init container beforehand.

Hope this makes sense.

issacg commented 4 years ago

@ishustava maybe I'm missing something, but how does that make it simpler? If the auto-encrypt CA changes/rotates/otherwise-invalidates, then the certificate that I've fetched in the initcontainer will no longer be valid for whatever process (int the "main" container) is talking to the consul agent, and things will stop working anyway.

If such a scenario is reasonable to expect, then either something in my main container either needs to self-destruct the pod to re-run the initcontainer (which is not always something that makes sense in production), or else have a process that auto-updates itself inside the main container by calling /v1/agent/connect/ca/roots on a consul server (not client), using the consul-ca-cert as the CA for only that call or else auto-updates utilizing consul-k8s.

This just seems like it's punting all of the juggling to the end-users :/

Can I ask why the auto-encrypt CA would be expected to change in the first place, as your reply above implies that to be central to why this is the recommended workflow.

Thanks!

ishustava commented 4 years ago

Hey @issacg

Let me elaborate a bit on my answer. Bear with me as I'm going to go into some implementation details to explain this decision better.

The first question is if we were to create a secret containing the CA certificate, when in the lifecycle of the Helm release should it be created. It can't be part of any existing job. The TLS init job runs as a pre-install hook, and at that time the servers aren't up for us to call the connect/ca/roots API. Other post-install jobs are not relevant to this task since they are tied to specific features of the Helm chart, for example, server-acl-init job only runs if ACLs are enabled. One possible implementation of this could be running another post-install job that will wait for servers to come up, and as soon as they are up, get the CA cert and create a secret. This would also mean that other components that are created at the same time and need the CA of the clients to talk to them, for example, connect injector webhook, also need some sort of an init container that waits for the CA secret to be created, similarly to how we're currently doing it if ACLs are enabled. I could see that it could work better if you can always guarantee that your applications that are talking to Consul always start after Consul is deployed and is running, which would eliminate the need for this init container that just waits for the CA secret to be there.

Second, there is an issue with CA rotations that I've mentioned. You are correct that if CA rotates, you are still responsible for restarting the components in the Helm chart. However, the way we implemented it currently always guarantees that the CA you have is the latest CA if the pod is restarted. If your CA provider supports cross-signing, you won't see downtime (meaning your CA won't automatically be invalid) and you will have time to restart your components. Here are the docs that talk about CA rotation in a bit more detail. If we add creating a secret via a post-install/upgrade, for example, then we have to require that operators rotating their CAs would run a Helm upgrade. Currently, you can simply run kubectl rollout restart, and this will update the CA.

Third, if your CA provider is Vault, then you probably don't need this at all and can just use vault secret injection and inject this CA directly into your container. I haven't tried it though, so this is purely theoretical 😄

All of that being said, I definitely hear you about it being confusing especially if you have an application that's talking to Consul directly rather than using one of our provided components. As part of this issue, we're going to document the process for doing that, and I hope that will clarify any confusion other users may have in the future. At this point though, managing the kubernetes secret containing the CA feels like too much effort.

Can I ask why the auto-encrypt CA would be expected to change in the first place, as your reply above implies that to be central to why this is the recommended workflow.

Consul has an API for rotating the CA certificates for connect (and auto-encrypt), and so we can't assume that it won't change. You may want to rotate your CA for different reasons, for example, if it expired or is compromised somehow. For this reason, we need to make sure that the Helm chart is supporting this case.

issacg commented 4 years ago

@ishustava Sorry for the delayed response, as I've been catching up on Hashiconf Digital. :D

These were excellent and very informative answers, thank you so much 🙏

The crux of it for me was the API for rotating the CA certificates for auto-encrypt, I was unaware that it was tied to the connect CA, so that explains so much of the hesitation on your side.

However, it also means that even my setting up an initcontainer is just as an imperfect solution, because a CA could be rotated and all pods will still need to self-destruct in order to get the new one.

It still seems to me that having an injector script automatically do the initcontainer and populate either a secret (or a file - that's fine, too!) still seems like less moving pieces for a more casual operator (and makes for future-proof sane defaults should changes be made down the line to consul-k8s or consul that affect the APIs in question) - same as already exists for injection of the consul agent ACL (consul-k8s acl-init), which could be contaminated by an operator, for example, deleting said token from consul.

Using Vault as a CA would be lovely if the Helm chart offered a way to bootstrap it. I use Vault's PKI for other things, but in my personal (and very opinionated :)) philosophy, the moment I'm using the Helm chart with all of it's magic, I want to minimize the magic I do on my end which might interfere - I guess that's the same sentiment that drives me to still think that a Helm-provided initcontainer for auto-encrypt would be preferable to do-it-yourself. If there was an officially blessed tls.vaultPKI section to the helm chart that would allow the same magic of distributing the certs and keys (maybe minus the k8s secrets, since the keys could be fetched via vault-k8s instead) then I'd probably jump on it.

I hope that makes sense to you. Regardless, thanks again for the detailed reply!

lawliet89 commented 4 years ago

Related to this issue, I have created two feature requests in consul-k8s:

In the meantime, I have a "poor man's alternative" that creates a Consul Template Deployment that constantly watches the Connect CA roots and updates the values to ConfigMaps in one or more namespaces. You can use something like configmap-reload (https://github.com/jimmidyson/configmap-reload) to send signals to your applications on changes.

https://github.com/basisai/consul-autoencrypt-k8s