Closed F21 closed 1 year ago
Worked out the root cause of the problem. I am using Vault Agent to request the certificate from the acme plugin. The agent authenticates using auto-auth and the mount has a max ttl of 30 minutes. ACME also has a max ttl of 30 minutes.
When both leases expires, there are no more users of the certificate and the plugin revokes the certificate. The solution is to change the auth method to issue periodic tokens and increase ACME's max ttl to 1 year.
Just noticed this problem on a test cluster running Vault 1.14.3 (although it seems to have happened with earlier versions of Vault) without debugging turned on, so I will need to gather further information.
I am requesting a wildcard certificate that looks like this:
*.something.mydomain.com
. I noticed that if I search for the domain on crt.sh, new certificates for the domain are requested every 1 to 3 days, leading to rate-limiting from let's encrypt.I currently have
default_lease_ttl
andmax_lease_ttl
for the acme mount set to30m
. The reason for short TTLs is that I want to reduce the number of orphan leases in Vault as the number of leases grows exponentially with long TTLs, which causes performance issues.Other than that, the account under
lets-encrypt/accounts/something.mydomain.com
is set to use the cloudflare dns api challenge. Finally, for the certificate, atlets-encrypt/roles/something.mydomain.com
, the account is set tosomething.mydomain.com
and allowed domains is set to[something.mydomain.com]
,allow_bare_domains = true
, andallow_subdomains = true
.Another thing I noticed is that the plugin will often crash with
exit 2
and I often have a bunch of leases, that went into negative TTLs.Unfortunately, without logging set to debugging, these are currently on the information I can provide. After starting vault in debug mode, I have the following logs (I removed the unnecessary entries not related to the plugin), which is currently unable to get a certificate as I've hit the rate limit: