smallstep / certificates

🛡️ A private certificate authority (X.509 & SSH) & ACME server for secure automated certificate management, so you can use TLS everywhere & SSO for SSH.
https://smallstep.com/certificates
Apache License 2.0
6.35k stars 415 forks source link

[Bug]: No certificates issued anymore #1856

Closed ottobaer closed 1 month ago

ottobaer commented 1 month ago

Steps to Reproduce

Hi,

After replacing dnsmasq with bind I can't get any certificates anymore it seems.

Your Environment

Expected Behavior

Expecting to get certificates either via tls-alpn or http challenge.

Actual Behavior

I'm trying to get a certificate with certbot ( I tried with lego and caddy, but I get the same error).

[pandabaer.lan.ursidae.space] failed to initiate challenge: Post "https://pki.lan.ursidae.space:10443/acme/acme/challenge/RDsb1u3j8ewE09bfRmZTzX6JYfvvhMOi/YRan9DzGPuRZ5vQNSnt043hMrz4c5gRh": local error: tls: bad record MAC

On the step-ca side I get this:


May 26 10:29:43 sunbear.lan.ursidae.space step-ca[4382]: time="2024-05-26T10:29:43+02:00" level=info duration=16.508265ms duration-ns=16508265 fields.time="2024-05-26T10:29:43+02:00" method=POST name=ca nonce=QWhoUWViUWRLdElIcDRLbURMNkJIY09XTE9aOWdmUDc path=/acme/acme/authz/RDsb1u3j8ewE09bfRmZTzX6JYfvvhMOi protocol=HTTP/1.1 referer= remote-address=192.168.1.25 request-id=1cff4c8e-54a2-4e70-b6c8-653e0c3f006a response="{\"identifier\":{\"type\":\"dns\",\"value\":\"pandabaer.lan.ursidae.space\"},\"status\":\"pending\",\"challenges\":[{\"type\":\"dns-01\",\"status\":\"pending\",\"token\":\"b76rsopbfenJfiaPF5ecEbEucZYxnIqp\",\"url\":\"https://pki.lan.ursidae.space:10443/acme/acme/challenge/RDsb1u3j8ewE09bfRmZTzX6JYfvvhMOi/VYrG3FFNPaTMS4c6ohZblULShCnsTqNo\"},{\"type\":\"http-01\",\"status\":\"pending\",\"token\":\"b76rsopbfenJfiaPF5ecEbEucZYxnIqp\",\"url\":\"https://pki.lan.ursidae.space:10443/acme/acme/challenge/RDsb1u3j8ewE09bfRmZTzX6JYfvvhMOi/2EkOBiZw7xtYX9vTcWjVmeCzP4IsPBjB\"},{\"type\":\"tls-alpn-01\",\"status\":\"pending\",\"token\":\"b76rsopbfenJfiaPF5ecEbEucZYxnIqp\",\"url\":\"https://pki.lan.ursidae.space:10443/acme/acme/challenge/RDsb1u3j8ewE09bfRmZTzX6JYfvvhMOi/YRan9DzGPuRZ5vQNSnt043hMrz4c5gRh\",\"error\":{\"type\":\"urn:ietf:params:acme:error:connection\",\"detail\":\"The server could not connect to validation target\"}}],\"wildcard\":false,\"expires\":\"2024-05-27T08:29:23Z\"}" size=906 status=200 user-agent="lego-cli/v4.16.1 xenolf-acme/4.16.1 (release; linux; amd64)" user-id=
May 26 10:29:43 sunbear.lan.ursidae.space step-ca[4382]: time="2024-05-26T10:29:43+02:00" level=warning duration=18.269363ms duration-ns=18269363 error="expected POST-as-GET" fields.time="2024-05-26T10:29:43+02:00" method=POST name=ca nonce=eTAzNENhYVNCcVFkeUl1WnUzWnlBQXNVcEROT2hVMlk path=/acme/acme/authz/RDsb1u3j8ewE09bfRmZTzX6JYfvvhMOi protocol=HTTP/1.1 referer= remote-address=192.168.1.25 request-id=0780c212-6d3f-4f1f-889a-fe002506cfc4 response="{\"type\":\"urn:ietf:params:acme:error:malformed\",\"detail\":\"The request message was malformed\"}" size=93 status=400 user-agent="lego-cli/v4.16.1 xenolf-acme/4.16.1 (release; linux; amd64)" user-id=

Interestingly I also get the same reponse when I'm trying to get a certificate on the same machine step-ca runs on

step ca certificate --provisioner acme sunbear.lan.ursidae.space a.crt a.key --ca-url https://pki.lan.ursidae.space:10443/acme/acme/directory --root /etc/caddy/certs/root_ca.crt 
✔ Provisioner: acme (ACME)
Using Standalone Mode HTTP challenge to validate sunbear.lan.ursidae.space . Error!

error validating ACME Challenge at https://pki.lan.ursidae.space:10443/acme/acme/challenge/yqKImXsh79qtWgI3rZJiFnnlpCcYDUmu/9hTWH0bWKJnKwJfYHteKSEPxH1SjMFxh: client POST https://pki.lan.ursidae.space:10443/acme/acme/new-order failed: Post "https://pki.lan.ursidae.space:10443/acme/acme/challenge/yqKImXsh79qtWgI3rZJiFnnlpCcYDUmu/9hTWH0bWKJnKwJfYHteKSEPxH1SjMFxh": stream error: stream ID 17; INTERNAL_ERROR; received from peer

In the logs I get this

May 26 10:46:25 sunbear.lan.ursidae.space step-ca[4382]: time="2024-05-26T10:46:25+02:00" level=info duration=20.02632722s duration-ns=20026327220 fields.time="2024-05-26T10:46:05+02:00" method=POST name=ca nonce=NjVRMjV0TFlVNVRrazNkTlRtTUc4a1JFZ0RwWDRQUzg path=/acme/acme/challenge/yqKImXsh79qtWgI3rZJiFnnlpCcYDUmu/9hTWH0bWKJnKwJfYHteKSEPxH1SjMFxh protocol=HTTP/2.0 referer= remote-address=192.168.1.2 request-id=7ffe45e8-3e9b-4ad2-a1d3-ef946907ea0a response="{\"type\":\"http-01\",\"status\":\"pending\",\"token\":\"RmTig8o5sekhGW3p31wBAAgHU3sXebCn\",\"url\":\"https://pki.lan.ursidae.space:10443/acme/acme/challenge/yqKImXsh79qtWgI3rZJiFnnlpCcYDUmu/9hTWH0bWKJnKwJfYHteKSEPxH1SjMFxh\",\"error\":{\"type\":\"urn:ietf:params:acme:error:connection\",\"detail\":\"The server could not connect to validation target\"}}" size=329 status=200 user-agent="Smallstep CLI/0.26.1 (linux/arm64)" user-id=

This is definitely not a connection problem, this was on the same host also there are no firewalls.

I can connect to the acme server with a browser over https without a problem (this was on the same PC I tried getting the certbot certificate above).

I also tried getting a certificate on another host with a JWT token, this works without a problem.

Additional Context

Here is the DNS information, it looks ok for me.

Here is for the host requesting the certificate

forward pandabaer.lan.ursidae.space


; <<>> DiG 9.18.27 <<>> pandabaer.lan.ursidae.space
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17475
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 09925ab558533157010000006652f7fce742106bf041e0a2 (good)
;; QUESTION SECTION:
;pandabaer.lan.ursidae.space.   IN      A

;; ANSWER SECTION:
pandabaer.lan.ursidae.space. 1200 IN    A       192.168.1.25

;; Query time: 0 msec
;; SERVER: 192.168.1.2#53(192.168.1.2) (UDP)
;; WHEN: Sun May 26 10:51:08 CEST 2024
;; MSG SIZE  rcvd: 100

reverse 192.168.1.25


; <<>> DiG 9.18.27 <<>> -x 192.168.1.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34539
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 1634039cb73a255a010000006652f816952391d1bfeb8ebf (good)
;; QUESTION SECTION:
;2.1.168.192.in-addr.arpa.      IN      PTR

;; ANSWER SECTION:
2.1.168.192.in-addr.arpa. 1200  IN      PTR     sunbear.lan.ursidae.space.

;; Query time: 0 msec
;; SERVER: 192.168.1.2#53(192.168.1.2) (UDP)
;; WHEN: Sun May 26 10:51:34 CEST 2024
;; MSG SIZE  rcvd: 120

Here for the host step-ca runs on

forward pki.lan.ursidae.space

; <<>> DiG 9.18.27 <<>> pki.lan.ursidae.space
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50928
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: ffba18ce4970085a010000006652f859d7c015453c7041be (good)
;; QUESTION SECTION:
;pki.lan.ursidae.space.         IN      A

;; ANSWER SECTION:
pki.lan.ursidae.space.  86400   IN      CNAME   sunbear.lan.ursidae.space.
sunbear.lan.ursidae.space. 1200 IN      A       192.168.1.2

;; Query time: 0 msec
;; SERVER: 192.168.1.2#53(192.168.1.2) (UDP)
;; WHEN: Sun May 26 10:52:41 CEST 2024
;; MSG SIZE  rcvd: 116

reverse 192.168.1.2


; <<>> DiG 9.18.27 <<>> -x 192.168.1.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33060
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 425ebda4d519d378010000006652f871cc652545e67aaeb6 (good)
;; QUESTION SECTION:
;2.1.168.192.in-addr.arpa.      IN      PTR

;; ANSWER SECTION:
2.1.168.192.in-addr.arpa. 1200  IN      PTR     sunbear.lan.ursidae.space.

;; Query time: 0 msec
;; SERVER: 192.168.1.2#53(192.168.1.2) (UDP)
;; WHEN: Sun May 26 10:53:05 CEST 2024
;; MSG SIZE  rcvd: 120
ottobaer commented 1 month ago

I tried going back a few versions now.

The oldest one I tried so far is

Smallstep CA/0.24.3-rc.5 (linux/arm64)
Release Date: 2023-07-27T22:11:35Z

still the same problem.

ottobaer commented 1 month ago

I found out that I forgot to change the IP of the dns server I manually added with '--resolver'.

The errors you get are rather interesting though if you do that.

hslatman commented 1 month ago

Hey @ottobaer, good to hear you resolved this 🙂 I was about to ask checking to see if the --resolver option works for you, but that's not necessary anymore 😅

You're correct that the errors can be opaque. As with any networked software there can be several causes, one of which is resolving domain names correctly, and we currently don't introspect the error for details, nor do we perform additional diagnostics in the case of an error. There's an open issue for improving the state for DNS specifically here: https://github.com/smallstep/certificates/issues/1680. Outside of that, a bigger project is to overhaul our logging, but there's no concrete timeline for that yet.