joohoi / acme-dns

Limited DNS server with RESTful HTTP API to handle ACME DNS challenges easily and securely.
MIT License
2.19k stars 234 forks source link

error message every 10 minutes about managing the server certificate #337

Open fredcy opened 1 year ago

fredcy commented 1 year ago

I'm getting error output like this every 10 minutes from acme-dns. Is this spurious? Can I do anything to prevent the error?

error        maintenance        unable to get configuration to manage certificate; unable to renew        {"identifiers": ["auth.xxxx.com"], "error": "config returned for certificate [auth.xxxx.com] is not nil and points to different cache; got 0xc000027b90, expected 0xc000027c70 (this one)"}

The certificate used by acme-dns for its HTTPS traffic works fine, in that there is no complaint from the client side. In config.cfg I have tls = "letsencrypt".

I'm running acme-dns as a systemd service, running as the unprivileged acme-dns user. /var/lib/acme-dns/api-certs and everything comprised in there is owned and writable by the acme-dns user.

I tried removing all of /var/lib/acme-dns/api-certs and running acme-dns service from scratch. It rebuilds the cert in a new /var/lib/acme-dns/api-certs directory, but I soon get the same recurring error messages.

It appears that the error message comes from the github.com/caddyserver/certmagic module used by acme-dns, but I have not been able to work out why it throws that error.

I ran acme-dns as root for a while before improving the setup to run as an unprivileged user. I wonder if that left some remnant that needs to be cleaned up, but I can't find any such thing.

danielztolnai commented 10 months ago

I am receiving the same messages. Did you manage to find out the cause?

I've never run acme-dns as root, so that can be ruled out. I built the executable from the latest source using go 1.18.1 and am running it on a fresh Ubuntu 22.04.3 using the provided systemd service. I also have tls = "letsencrypt" and I'm also using the recommended user setup.

fredcy commented 10 months ago

I ended up not using acme-dns and so I don't have more info.

PKizzle commented 10 months ago

May I ask what you are using as an alternative?

fredcy commented 10 months ago

(Probably not helpful, but...) I was planning to use acme-dns to manage certs on a private development network on a DNS sub-domain. It worked OK, but management decided to just buy a wildcard cert for that subdomain, making acme-dns moot in our case.

PKizzle commented 10 months ago

Ah okay. I thought there might be a different solution to acme-dns but that does not seem to be the case then.

PKizzle commented 10 months ago

For the rest that are facing the cache issue: I have found a solution but am not sure whether this is the correct patch as I have added quite a bit of source code to acme-dns. So try it out and give feedback whether it works for you. It is based on the refactoring branch.

From 003a56d677fe0cf621ea92fc9446cf45a199e277 Mon Sep 17 00:00:00 2001
From: Philipp Kolberg <philipp.kolberg@t-online.de>
Date: Wed, 29 Nov 2023 22:43:49 +0100
Subject: [PATCH] Fix certmagic cache handling

---
 pkg/api/api.go | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/pkg/api/api.go b/pkg/api/api.go
index 9fc96f6..bd75129 100644
--- a/pkg/api/api.go
+++ b/pkg/api/api.go
@@ -4,6 +4,7 @@ import (
    "context"
    "crypto/tls"
    "net/http"
+   "sync"

    "github.com/acme-dns/acme-dns/pkg/acmedns"

@@ -14,10 +15,12 @@ import (
 )

 type AcmednsAPI struct {
-   Config  *acmedns.AcmeDnsConfig
-   DB      acmedns.AcmednsDB
-   Logger  *zap.SugaredLogger
-   errChan chan error
+   Config       *acmedns.AcmeDnsConfig
+   DB           acmedns.AcmednsDB
+   Logger       *zap.SugaredLogger
+   errChan      chan error
+   magicCache   *certmagic.Cache
+   magicCacheMu sync.Mutex
 }

 func Init(config *acmedns.AcmeDnsConfig, db acmedns.AcmednsDB, logger *zap.SugaredLogger, errChan chan error) AcmednsAPI {
@@ -137,12 +140,17 @@ func (a *AcmednsAPI) setupTLS(dnsservers []acmedns.AcmednsNS) *certmagic.Config
    magicConf.Logger = a.Logger.Desugar()
    magicConf.Storage = &storage
    magicConf.DefaultServerName = a.Config.General.Domain
-   magicCache := certmagic.NewCache(certmagic.CacheOptions{
-       GetConfigForCert: func(cert certmagic.Certificate) (*certmagic.Config, error) {
-           return &magicConf, nil
-       },
-       Logger: a.Logger.Desugar(),
-   })
-   magic := certmagic.New(magicCache, magicConf)
+   a.magicCacheMu.Lock()
+   if a.magicCache == nil {
+       a.magicCache = certmagic.NewCache(certmagic.CacheOptions{
+           GetConfigForCert: func(cert certmagic.Certificate) (*certmagic.Config, error) {
+               return a.setupTLS(dnsservers), nil
+           },
+           Logger: a.Logger.Desugar(),
+       })
+   }
+   certCache := a.magicCache
+   a.magicCacheMu.Unlock()
+   magic := certmagic.New(certCache, magicConf)
    return magic
 }
-- 
2.39.3 (Apple Git-145)
maddes-b commented 7 months ago

I have the same issue with the current master @27e8251d11ba0a08c9b576fc04d61c1c7ba9b500 What is striking is that it creates 2 caches, but I do not know where these are coming from:

Apr 08 20:13:32 vmanager9064 acme-dns[16782]: 1.7126072127843883e+09        info        maintenance        started background certificate maintenance        {"cache": "0xc000026800"}
Apr 08 20:13:32 vmanager9064 acme-dns[16782]: 1.7126072127844315e+09        info        maintenance        started background certificate maintenance        {"cache": "0xc000026880"}
...
Apr 08 20:43:32 vmanager9064 acme-dns[16782]: 1.712609012784654e+09        error        maintenance        unable to get configuration to manage certificate; unable to renew        {"identifiers": ["<snip>"], "error": "config returned for certificate [<snip>] is not nil and points to different cache; got 0xc000026800, expected 0xc000026880 (this one)"}

This issue is independent of having only a single server and account in api-certs/acme directory, or having multiple servers and/or accounts in there. Also just updating to latest certmagic v0.20.0 did not help (GOPATH=/tmp/go-acme-dns go get -u github.com/caddyserver/certmagic@v0.20.0). Last bump of certmagic was https://github.com/joohoi/acme-dns/pull/334.

On master @6ba9360156b8658dbbd652eea100c11cc098b1f8 I do not see messages for any caches, and do not get any renew errors every 10 minutes. @joohoi Is this the reason for the other repo at https://github.com/acme-dns/acme-dns/ ? Your personal repo here for development/testing and the other one for production state ?

I found a similar issue for https://github.com/caddyserver/caddy/issues/5162 (with PR https://github.com/caddyserver/caddy/pull/5169 merge https://github.com/caddyserver/caddy/commit/ac96455a9a6ac34eb8ea95339838038e725f5bee) also related how to use certmagic. Do not know if https://github.com/joohoi/acme-dns/issues/337#issuecomment-1890784616 can be adapted to current master and would fix it.

Update 2024-04-11: I got a solution for the current release, currently testing all cases (renewal, revoked, etc.) plus adding some more debug log messages.

maddes-b commented 7 months ago

Fix for current master developed and tested. Pull rquest is https://github.com/joohoi/acme-dns/pull/351

sndrsmnk commented 4 months ago

ty. wonder why this isn't merged yet. manually applied and now i have a new cert again! :+1: