icing / mod_md

Let's Encrypt (ACME) in Apache httpd
https://icing.github.io/mod_md/
Apache License 2.0
333 stars 28 forks source link

certificate not renewing (finished successfully. Ongoing...) #323

Open svde opened 10 months ago

svde commented 10 months ago

On a few of our servers (ubuntu 23.04, apache 2.4.55, mod_md 2.4.19) randon certificates aren't renewing.

example from the acme server status

{
  "name": "***redacted***",
  "domains": [
    "***redacted***"
  ],
  "contacts": [
    "mailto:***redacted***"
  ],
  "transitive": 1,
  "ca": {
    "account": "ACME-.letsencrypt.org-0000",
    "proto": "ACME",
    "url": "https://acme-v02.api.letsencrypt.org/directory",
    "urls": [
      "https://acme-v02.api.letsencrypt.org/directory"
    ],
    "agreement": "accepted"
  },
  "privkey": {
    "type": "RSA",
    "bits": 4096
  },
  "state": 2,
  "renew-mode": 1,
  "renew-window": "33%",
  "warn-window": "10%",
  "must-staple": false,
  "proto": {
    "acme-tls/1": []
  },
  "stapling": true,
  "cert": {
    "rsa": {
      "valid": {
        "from": "Thu, 17 Aug 2023 07:57:18 GMT",
        "until": "Wed, 15 Nov 2023 07:57:17 GMT"
      },
      "serial": "***redacted***",
      "sha256-fingerprint": "***redacted***",
      "ocsp": {
        "status": "good",
        "valid": {
          "from": "Tue, 17 Oct 2023 04:00:00 GMT",
          "until": "Tue, 24 Oct 2023 03:59:58 GMT"
        },
        "renewal": {
          "name": "***redacted***.net",
          "finished": true,
          "notified": false,
          "notified-renewed": false,
          "last-run": "Wed, 18 Oct 2023 23:53:27 GMT",
          "errors": 0,
          "last": {
            "status": 0,
            "detail": "certificate status is GOOD, status valid Tue, 17 Oct 2023 04:00:00 GMT - Tue, 24 Oct 2023 03:59:58 GMT",
            "activity": "status of certid ***redacted***, reading response"
          }
        }
      }
    },
    "valid": {
      "from": "Thu, 17 Aug 2023 07:57:18 GMT",
      "until": "Wed, 15 Nov 2023 07:57:17 GMT"
    }
  },
  "renew-at": "Mon, 16 Oct 2023 15:09:18 GMT",
  "watched": true,
  "renew": true,
  "renewal": {
    "name": "***redacted***.net",
    "finished": true,
    "notified": true,
    "notified-renewed": true,
    "last-run": "Thu, 17 Aug 2023 08:57:12 GMT",
    "valid-from": "Thu, 17 Aug 2023 08:57:19 GMT",
    "errors": 0,
    "last": {
      "status": 0,
      "valid-from": "Thu, 17 Aug 2023 08:57:19 GMT"
    },
    "cert": {}
  }
},

The files in the md directory for this (redacted) fqdn still have timestamps from when the certificate was last renewed. The server status shows "finished successfully. Ongoing...". The only way I've been able to renew them is to remove the directories for the fqdn in the md directory and start over for those fqdn's. Is there a better way to do it?

icing commented 10 months ago

Do you have files in md/staging/<fqdn> when this happens? Is there a job.json file in there logging what it has done?

svde commented 10 months ago

Most recent entries from the staging job.json. My observation is that it didn't do anything.

{ "name": "redacted.net", "finished": true, "notified": true, "notified-renewed": true, "last-run": "Thu, 17 Aug 2023 08:57:12 GMT", "valid-from": "Thu, 17 Aug 2023 08:57:19 GMT", "errors": 0, "last": { "status": 0, "valid-from": "Thu, 17 Aug 2023 08:57:19 GMT" }, "log": { "entries": [ { "when": "Thu, 17 Aug 2023 08:58:41 GMT", "type": "message-renewed" }, { "when": "Thu, 17 Aug 2023 08:57:19 GMT", "type": "finished" }, { "when": "Thu, 17 Aug 2023 08:57:19 GMT", "type": "progress", "detail": "The certificate for the managed domain has been renewed successfully and can be used (valid since Thu, 17 Aug 2023 08:57:19 GMT). A graceful server restart now is recommended." },

svde commented 10 months ago

Removing the (stale?) fqdn directories in md/stagaing seems to solve the issue.

mritzmann commented 3 months ago

I had the same problem today. After deleting /etc/apache2/md/staging/<domain> the certificate was successfully renewed. Before that, the LogLevel was increased to md:debug, but I did not find anything suspicious.

[Mon Apr 29 08:44:32.650718 2024] [mpm_event:notice] [pid 957:tid 140277553302464] AH00493: SIGUSR1 received.  Doing graceful restart
[Mon Apr 29 08:44:32.726486 2024] [md:info] [pid 957:tid 140277553302464] AH10071: mod_md (v2.4.25), initializing...
[Mon Apr 29 08:44:32.726690 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(451): AH10037: server seems reachable via http: and reachable via https:
(..)
[Mon Apr 29 08:44:32.726792 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(719): AH10039: Completed MD[sub.example.com, CA=(null), Proto=ACME, Agreement=accepted, renew-mode=1 renew_window=33%, warn_window=10%
(..)
[Mon Apr 29 08:44:32.726857 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(608): AH10041: Server sub.example.com:0 matches md sub.example.com (config sub.example.com[default, default], match-mode=0) for domain sub.example.com, has now 1 MDs
[Mon Apr 29 08:44:32.726861 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(608): AH10041: Server sub.example.com:0 matches md sub.example.com (config sub.example.com[default, default], match-mode=0) for domain sub.example.com, has now 1 MDs
[Mon Apr 29 08:44:32.726876 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(841): sync MDs, start
[Mon Apr 29 08:44:32.726917 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(904): sync MDs, 7 existing, 0 moved, 0 new.
(..)
[Mon Apr 29 08:44:32.732397 2024] [md:debug] [pid 957:tid 140277553302464] md_ocsp.c(322): md[sub.example.com]: priming OCSP status
[Mon Apr 29 08:44:32.732466 2024] [md:debug] [pid 957:tid 140277553302464] md_ocsp.c(361): md[sub.example.com]: adding ocsp info (responder=http://r3.o.lencr.org)
(..)
[Mon Apr 29 08:44:32.749708 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(555): AH10169: sub.example.com: https server_rec for sub.example.com does not have protocol acme-tls/1 enabled
[Mon Apr 29 08:44:32.750013 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(256): md{sub.example.com}: certificate(0) is ok
[Mon Apr 29 08:44:32.750036 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(941): loading md sub.example.com
[Mon Apr 29 08:44:32.750058 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(943): loaded md sub.example.com
[Mon Apr 29 08:44:32.750067 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(987): sync MDs, finish done
(..)
[Mon Apr 29 08:44:32.750466 2024] [md:debug] [pid 957:tid 140277553302464] md_reg.c(1112): sub.example.com: init done
[Mon Apr 29 08:44:32.752378 2024] [md:debug] [pid 957:tid 140277553302464] mod_md.c(1006): AH10074: 7 out of 7 mds need watching
[Mon Apr 29 08:44:32.752757 2024] [md:debug] [pid 957:tid 140277553302464] mod_md_drive.c(342): AH10067: register md renew watchdog(_md_renew_)
[Mon Apr 29 08:44:32.752962 2024] [md:debug] [pid 957:tid 140277553302464] mod_md_ocsp.c(266): AH10203: register md ocsp watchdog(_md_ocsp_)
[Mon Apr 29 08:44:32.753127 2024] [mpm_event:notice] [pid 957:tid 140277553302464] AH00489: Apache/2.4.59 (Debian) OpenSSL/3.0.11 mod_qos/11.74 configured -- resuming normal operations
[Mon Apr 29 08:44:32.753139 2024] [core:notice] [pid 957:tid 140277553302464] AH00094: Command line: '/usr/sbin/apache2'
[Mon Apr 29 08:44:32.755303 2024] [md:debug] [pid 777586:tid 140277419341504] mod_md_drive.c(202): AH10054: md watchdog start, auto drive 7 mds
[Mon Apr 29 08:44:32.755337 2024] [md:debug] [pid 777586:tid 140277410948800] mod_md_ocsp.c(158): AH10197: md ocsp watchdog start, ocsp stapling 7 certificates
[Mon Apr 29 08:44:32.855601 2024] [md:debug] [pid 777586:tid 140277419341504] mod_md_drive.c(207): AH10055: md watchdog run, auto drive 7 mds
[Mon Apr 29 08:44:32.855634 2024] [md:debug] [pid 777586:tid 140277410948800] mod_md_ocsp.c(164): AH10198: md ocsp watchdog run, ocsp stapling 7 certificates
(..)
[Mon Apr 29 08:44:32.855716 2024] [md:debug] [pid 777586:tid 140277410948800] md_ocsp.c(901): OCSP status updates due: 0
[Mon Apr 29 08:44:32.855726 2024] [md:debug] [pid 777586:tid 140277410948800] mod_md_ocsp.c(178): AH10199: md ocsp watchdog next run in 59 minutes 59 seconds
(..)
[Mon Apr 29 08:44:32.856613 2024] [md:debug] [pid 777586:tid 140277419341504] mod_md_drive.c(229): AH10107: next run in 11 hours 59 minutes 59 seconds

If it occurs again, what would I have to do to debug it further?

opsec commented 2 months ago

Had the same problem on FreeBSD 13.2p11, apache 2.4.59, mod_md 2.4.15. The staging/ cleanup works, but it's ugly.

Where do I have to put md:debug in the apache config to get logs of the process ?

icing commented 2 months ago

Had the same problem on FreeBSD 13.2p11, apache 2.4.59, mod_md 2.4.15. The staging/ cleanup works, but it's ugly.

Where do I have to put md:debug in the apache config to get logs of the process ?

You can add LogLevel md:debug anywhere on the base server level.

svde commented 1 month ago

Our setup involves two webservers behind a load balancer. We use a script called by MDMessageCmd to synchronize (challenges and when done: full apache config) to the other node. In that script the renewed case triggers a check to see if all certificates that are being renewed are done, and only then will it reload the webserver and synchronize the apache configuration (including new certs) to the other host.

My observation is that if there are leftover directories in md/staging, it's mostly from the last one being renewed.