Closed tamalsaha closed 2 years ago
How exactly would this work? What will Vault use to authorize all of the different names to LE?
@jefferai, for DNS challenges Vault can do all the parts. This is also required for issuing wild card certificates. User provides the domain names and the DNS provider (Route53, etc) credential. Vault setups the TXT record needed to pass domain validation.
For HTTP challenge, I am not sure. The domain validation happens via responding to a well-known path . Vault could potentially expose this path and user's have to manually setup their LB (nginx/haproxy) to expose under their domains . If user is running in Kubernetes, they have to just configure their Ingress
object to pass well known paths to Vault.
We are using https://github.com/xenolf/lego project as a library in our Voyager . We struggled with dealing with LE's rate limiting. But I think we have a good handle on it now.
My general motivation is to make Vault the center piece for secret management for Kubernetes (not official project). We have written an operator, making Vault as db secret manager, looking into adding a CSI driver and handle TLS secrets. Kubernetes has a crazy maze of mTLS. I feel life will be easier for ops folks if there was a central place to deal with TLS stuff.
I am not a Vault expert, but from what I understand, no other secret backend has to deal with a similar type of authorisation in order to issue secrets.
As you've already noted @tamalsaha, it can be difficult to deal with rate limits properly & the wide variety of HTTP servers, and even differing APIs and authentication mechanisms involved with ACME can quickly become very complex.
From what I understand, vault tries to keep this kind of code, either listening for unauthenticated connections or calling out to many different types of APIs to a minimum, given the sensitivity of data stored in Vault. Solving HTTP at least would involve exposing Vault on the public internet (for LE/public acme servers)
Beyond that, I think waiting until the ACME draft spec has stabilised would be wise - there are currently still a number of divergences between Let's Encrypt and the ACME spec (these are gradually going away!) which could cause quite a lot of backwards compatibility pain.
My general motivation is to make Vault the center piece for secret management for Kubernetes (not official project) .... I feel life will be easier for ops folks if there was a central place to deal with TLS stuff.
Regarding kubernetes specifically, and a shameless self plug, but that is one of the main drivers/goals for cert-manager, albeit we don't have a well defined charter that notes this anywhere 😄 although we don't yet have a CSI plugin or flex volume plugin.
All well said, @munnerz . We generally take the approach of making Vault a one stop shop for secrets needs within your Intranet, and as such what we often find are people using Vault for easy deployment of certificates to apps and machines communicating with each other and LE for anything on the edge. It's a reasonable model given the different goals of the product.
FWIW we have many endeavors going on to make it easier to use Vault when on Kubernetes. The automatic authentication provided by Vault Agent that was just released is a step towards that. We plan to eventually have a sidecar and/or CSI container mixing Vault and Consul-Template to make it easy to do the rest.
@jefferai , Does this mean that LE based certificate issuing should be outside the scope of Vault ?
Regarding We plan to eventually have a sidecar and/or CSI container mixing Vault and Consul-Template to make it easy to do the rest.
Do you have any issue that we can follow? We have been interested in that too https://github.com/kubernetes/kubernetes/issues/66362 . If Vault project wants to do this, we can wait :) .
@tamalsaha They're really quite different paradigms. I don't know the ACME spec well enough to know whether it's feasible at all in Vault, but we generally have viewed Vault's PKI capability and LE as complementary, rather than replacements for each other. Vault is significantly more flexible but not really suited for issuing publicly-trusted certs, and vice versa.
Re an init/sidecar/CSI container, there's no issue to follow. All I can tell you right now is that it will happen, but I can't give any concrete timeline. The release of Vault Agent was a step made very specifically in that direction, however (as well as other directions, of course) as it's a key component of any such solution.
To clarify: an init/sidecar container is definitely on our roadmap. CSI is something we'd like to support but as you are probably aware from https://github.com/kubernetes/kubernetes/issues/64984 there are still gaps.
Thanks @jefferai. If there is any way we can contribute, we will be interested.
Hi, I'm having the same issue as the author, we would like to use Vault to distribute Let's Encrypt signed certificates to our services. Using Vault's PKI engine means we would need to distribute the root CA to all our users which would be OK for our servers but is pretty inconvenient and difficult to do securely for our users.
While we could use certbot, this would mean we would have to give the API key to our DNS provider to every service that need a TLS cert but as CloudFlare (and many others) does not support granular permissions we would like to avoid that.
We are currently using https://www.terraform.io/docs/providers/acme/index.html which solves the signing part but not how to distribute the cert to the service nor how to renew them.
It seems to me that making Vault use the DNS Acme challenge would be great for this, it should be easy enough as most of the work has already be done in lego and the Acme terraform provider.
I don't think implementing the HTTP challenge would be a great idea, in most of the deployment Vault is not accessible from the Internet, does not listen to 80/443 and this would require cooperation from the load-balancers which would make the configuration complex.
If this seems good to you, I will start working on an implementation for this.
BTW https://github.com/hashicorp/vault/issues/4362 is related to this and has 46 👍
May I recommend using CertMagic for this?
It's the same well-vetted library used by the Caddy Web Server, and it is the most mature ACME client implementation in Go that is available.
It also supports pluggable storage, so you could have CertMagic store certificates directly in Vault.
It also coordinates certificate management in a cluster, as long as all instances in the cluster use the same storage configuration (i.e. a Vault instance).
You could also side-car Caddy if you prefer an external solution, but I'd strongly recommend bundling it directly into the application as a library whenever possible.
I'm currently developing a secrets plugin for Vault using lego. In my brief overview of certmagic I found that the pluggable storage adds unnecessary complexity to the logic. It seems more targeted at web developers, just wrapping the logic of lego (no offense intended, it looks like a great project, just not suited for this in my opinion).
I'm currently writing tests for the plugin, the basic functionality (register, obtain, renew and revoke) is in. It only supports dns-01 right now, and I'm unsure about supporting other challenges (like http-01 and tls-alpn-01) due to the fact that you'd have to expose the server running Vault.
@p3lim Lego is a raw ACME client library -- it simply facilitates the ACME protocol for you. It has methods like Register()
and Obtain()
. The difference is that CertMagic is all about keeping certificates renewed in the long run: while your server/process is running, ensure certificates stay renewed and that your TLS config can always serve the current certificates, without downtime.
So for anything long-running (and not a once-and-done command, for instance), use CertMagic.
In my brief overview of certmagic I found that the pluggable storage adds unnecessary complexity to the logic.
Can you explain what you mean? CertMagic's storage interface basically just requires Load, Store, List, Delete, and Lock/Unlock.
Hi everybody, I published my current implementation for this at https://github.com/remilapeyre/vault-acme.
It does most of what was asked for in this thread and we've been using it for some days and it seems to work fine.
Regarding CertMagic, I'm not sure I see what it may add to Lego that we need. One thing I needed from Lego that is not implemented is the ability to update an ACME account but I should be able to contribute that during next week. Lego is also the library used by the Terraform ACME provider and it may make maintenance easier to share the same library in Terraform and Vault
A few notes regarding the current plugin (I'm probably will forget plenty of things and will complete this later):
tls-alpn-*
challenges as those where withdrawn from the RFC after some vulnerabilities where found in them.I'm looking forward for feedback regarding this first implementation, let me now if you try it :)
@remilapeyre Great, I am looking forward to taking a closer look when I have a chance.
A few points of initial feedback:
This means that we may have to implement some cache for this backend to reduce the CSR sent to the provider, during the development I did hit those limits a few time.
I did not implement support tls-alpn-* challenges as those where withdrawn from the RFC after some vulnerabilities where found in them.
I did not implement support for HTTP challenges as this would require to expose Vault to the Internet or have some convoluted way to set an external webserver.
If you have in fact disabled the HTTP and TLS-ALPN challenges, it sounds like the only challenge that is enabled is the DNS challenge (?), which is fine if necessary, but it's also the challenge that requires the most configuration and has the most moving parts. Users should be aware that this DNS integration is required.
Does vault-acme automatically renew certificates too? Or does an outside program/user/lib trigger the reload?
Thanks @mholt, I will update my secret backend based on your feedbacks.
I'd highly recommend reading this draft of an upcoming document which advises best practices for ACME clients: https://github.com/https-dev/docs/blob/master/acme-ops.md
Here's what will need to be changed to match the recommendations:
Upon obtaining a certificate, immediately write it to persistent storage.
Currently each client requesting a certificate for foo.com
will get a new one. This is the usual behavior for Vault secrets backends and this mean I didn't need to save the certificate.
If a valid certificate already exists in storage, use that one instead of obtaining a new one.
Since I didn't save certificates on persistent storage, I could not do this. This is in line with the rest of the secrets engine that give a new secret on each call so their usage can be tracked and they can be revoked separately. Since this new secrets engine uses resources from the ACME provider, it makes sense to change the behavior and have caching. I will add a disable_cache
attribute on the acme/roles/:role
resource that defaults to false
so cache will be enabled by default, and users that want to disable it can do so.
Renew certificates after ⅔ of usable lifetime.
Currently, the TTL of the lease associated to the certificate is min(TTL of certificate, max TTL of the backend)
. When the secret is actually renewed depends on the client, for example Consul Template will renew the secret when it reaches 90% of the lease TTL which is more than the 2/3 recommended. To improve this I will add a cache_duration_ratio
that defaults to 0.7 and make the lease TTL min(cache_duration_ratio * TTL of certificate, max TTL of the backend)
. This should make clients renew the certificate when it reaches around 2/3 of its lifetime.
I strongly advise you support the TLS-ALPN challenge. It is the only challenge that works over port 443 (the TLS port), and is required if port 80 is not available or if the HTTP challenge has trouble getting a certificate.
I mistook TLS-ALPN for TLS-SNI as I had not seen the RFC that introduced TLS-ALPN.
Vault won't be able to handle those without substantial changes as secrets engine cannot use the .well-known/acme-challenge/
our set the content-type and the response body. Furthermore it is not recommended for Vault to listen for external traffic.
I think this could be achieved with a simple sidecar that handles .well-known/acme-challenge/
and looks in Vault to respond to the challenge. I will write it in the next week.
Does vault-acme automatically renew certificates too? Or does an outside program/user/lib trigger the reload?
Vault ACME does not renew certificates before a client asks for it when a lease expires. All secrets given by Vault have an associated lease though so they should already know to get a new secret when the current one expires.
Vault ACME does not renew certificates before a client asks for it when a lease expires. All secrets given by Vault have an associated lease though so they should already know to get a new secret when the current one expires.
I see, so this essentially acts as liaison between Vault clients and an ACME service, establishing 1:1 functions for Obtain() and Renew() (etc).
In other words, it is up to the client to take care of certificate management, and this tool only provides the functions to do ACME transactions. (Correct?)
If so, then hopefully, whatever clients use this tool will abide the best practices. :)
I think this could be achieved with a simple sidecar that handles .well-known/acme-challenge/ and looks in Vault to respond to the challenge. I will write it in the next week.
Any way that Caddy or CertMagic can help here? Their storage implementation is pluggable, so they could dump the certificates directly into Vault.
In other words, it is up to the client to take care of certificate management, and this tool only provides the functions to do ACME transactions. (Correct?)
This is the idea yes.
If so, then hopefully, whatever clients use this tool will abide the best practices. :)
The Vault operator can still enforce some of those at the server level and I added cache support in https://github.com/remilapeyre/vault-acme/commit/bd9589152a15d2642df8a4573724ccbb97518d3b
Any way that Caddy or CertMagic can help here?
I don't think so, a huge advantage of having the plugin generate the cert is that it is mlock
-ed and only speak to Vault through a secure channel with temporary credentials only usable once. We would loose some of the benefits by using Caddy or CertMagic.
@mholt I've finally added support for the HTTP-01 and TLS-ALPN-01 challenges in https://github.com/remilapeyre/vault-acme/commit/6ac7a952d9f7d04354aa4ecc0cc2be963e7da411 so all basic functionality should now be present.
We are not using TLS-ALPN-01 and HTTP-01 in our infra so they are less tested than DNS-01. We've been using DNS-01 for a few weeks now without issues, except when we hit Let'sEncrypt rate limits once before I implemented the cache.
I will look at the code in the next days to see what can be simplified, better documented but I think it may be ready for a cursory review.
I also plan to make a few changes in the next days:
cloudflare,route53
. This is already the behavior of the Terraform ACME provider so this should not be an issue.Looking forward to support for more DNS providers. What's needed to get it working?
All DNS providers supported by Lego are supported, I need to document this, you can look at the list at https://www.terraform.io/docs/providers/acme/dns_providers/index.html. So far it's only possible to use environment variables for the credentials.
If you are using a provider not yet configured, adding it to go-acme/lego would be the path forward.
Not sure how the environment variables are passed into vault. The relevant document is here?
https://github.com/remilapeyre/vault-acme/blob/master/website/source/docs/secrets/acme/index.html.md
Vault's environment variables are given to the plugin, so it depends on the way you are running them but doing export AKAMAI_ACCESS_TOKEN=...
before running Vault, using AKAMAI_ACCESS_TOKEN=... vault server -config-file=vault.hcl
or docker run -e AKAMAI_ACCESS_TOKEN=... vault
should work.
@remilapeyre nice work!
@remilapeyre Thank You. We use Auto Scaling EXTENSIVELY here and all "people facing" applications use Lets Encrypt for certificates. For many reasons, these application hosts can't use the web based challenge for Lets Encrypt, so we rely exclusively on the DNS based challenge.
It's unacceptable to give each auto-scale host permission to edit the entire DNS Zone and even if they all did have this permission, we'd quickly hit the rate-limits (i think it's 7 certs in 7 days?) due to auto scale activity.
I was about to embark on building a Lets Encrypt cache service using a small client based daemon i'd have to write and MQTT to link that client-daemon to some small service that i'd also have to write which would use S3/KMS and or Secure Parameter Store to cache the certificates and do the DNS challenge.
I'm so glad I checked to see if there was already any work around this subject involving Vault. You've saved me a ton of headache w/r/t re-inventing the wheel!
we are using acme-dns (https://github.com/joohoi/acme-dns) to restrict external dns for handshaking to limited records to individual users in a private environment. Seems to work ok.
@karl-tpio thanks for the feedback. Keep in mind that while there is tests and that I should fix all bugs shortly, it has not been reviewed so far. I would love to get your feedback and fix any issue you find though.
Hi. I am looking for the other side:
Having vault implement the acme server protocol. So I can just use Caddy/Traefik/... with vault as an ACME server to issue certificates from a private CA. Does anybody have a hint for me?
I think the support of the ACME protocol as a server was previously discussed and deemed out of scope as it is very different that the way Vault currently work. ACME suppose that you are unauthenticated and use a DNS or HTTP protocol to make sure you have access to the domain names you claimed while Vault has authentication built-in and only has an API.
I think the best way to use the Vault private CA with Traefik and Caddy might be to use vault agent template
to fetch and renew the SSL certificates.
At any rate #8690 would be the issue to track for that
@remilapeyre @weitzj Even if Vault does not become an ACME server, I can suggest a couple of ways Vault can be used with Caddy, at least (this thread is two years old; and things have advanced now that Caddy 2 is released):
There are numerous options here to support all sorts of use cases. I'm sure Vault could be compatible with at least one or two of these.
@remilapeyre's vault-acme looks like what I want, except that the DNS provider credentials should be inside vault...
My main reason for this is that my DNS provider is Cloudflare, which has very wide API permissions - I really don't want a token / API key lying around on several servers with the ability to edit / delete every DNS record on a domain. Keeping it in vault is one option, just getting the certificate out is better....
(I don't care about HTTP-01 or TLS-APLN-01 support. In cases that that is an option, certbot or cert-manager (on Kubernetes) tends to work fine. The use cases with traffic actually hitting vault for the hostname also seems quite limited. DNS-01 is much more useful in a more centralised (onto vault) application)
@remilapeyre's vault-acme looks like what I want, except that the DNS provider credentials should be inside vault...
Hi @mohag, I'm not sure what issue you are referring to exactly. Vault plugins store their data in the Vault secure storage. You may be referring to the fact that the configuration is made using the environment variables, since the v0.0.6
release that went out yesterday, you can now set them in the provider_configuration
map when creating the account so if you don't give read permissions to acme/account/name
nobody should be able to access them.
The use cases with traffic actually hitting vault for the hostname also seems quite limited
For the HTTP-01 and TLS-ALPN-01 challenges, they are not answered by the Vault directly but by a sidecar utility that you can deploy on the edge of your network and that connect back to Vault to answer the requests.
Let me know if I missed something and this was not the information you were looking for.
If you have some issues while using this backend, please open a new bug report at https://github.com/remilapeyre/vault-acme/
@remilapeyre Ah, that sounds like a good solution. (I saw some notes about env vars either here or in the docs and was a bit worried about that. With that resolved, it it seems as close to an ideal solution as something that is not built-in can get...)
I'm currently in the process on switching my environment from a bodgy combination of proxmox and kubernetes (most things are running in lxc containers in proxmox, and a few things are running in vms in proxmox, for example a microk8s single node cluster running ingress-nginx as a gateway to the outside world, and cert-manager, which at the moment both manages an internal CA chain and provides TLS certs from letsencrypt to ingress-nginx) to mostly hashicorp tools and then maybe a kubernetes cluster for kubernetes workloads (consul instead of ingress-nginx and an attempt to get istio working in the environment, and vault for certificates), because the kubernetes VM currently uses more resources than all the other things running in lxc combined.
For the HTTP-01 and TLS-ALPN-01 challenges, they are not answered by the Vault directly but by a sidecar utility that you can deploy on the edge of your network and that connect back to Vault to answer the requests.
I was wondering if it were possible to optionally use consul to answer HTTP-01, but maybe I'm too used to cert-manager handling everything to not see why that wouldn't work
@remilapeyre hello thanks for the work done so far, is it something you're still working on ?
Hashicorp Vault doesn't currently support this functionality and has no plans to support Let's Encrypt ACME integration in the near future.
As discussed above in the thread, vault-acme
is a community-maintained, third-party plugin that provides the requested functionality. We suggest individuals looking for this functionality consider evaluating this plugin for their use.
Any interest in revisiting this and bringing this into vault core? I note that the PKI backend now supports ACME clients, so it would not be a stretch to have vault issue certificates via ACME from letsencrypt and other issuers.
@F21 A bit off thread, but I recently contributed "acme proxy" capability to the Serles acme server. We considered going with the https://github.com/dvtirol/serles-acme.git but decided having more vanilla clients was a better model. You basically set up serles as an acme client using certbot with DNS validation, and then point other servers to it, using http-01 based validation.
We would like to be able to issue ssl certificates from Let's Encrypt using Vault and auto refresh then when certs are about to expire. Do you think Vault can support this as a secret engine?
We will be interested in contributing this feature if acceptable.