smallstep / certificates

🛡️ A private certificate authority (X.509 & SSH) & ACME server for secure automated certificate management, so you can use TLS everywhere & SSO for SSH.
https://smallstep.com/certificates
Apache License 2.0
6.6k stars 431 forks source link

Log DNS errors #1680

Open tacerus opened 8 months ago

tacerus commented 8 months ago

Hello!

Issue details

Currently, the only information about a DNS problem during a DNS-01 challenge seems to be the very generic RFC8555 error message returned via HTTP:

\"error\":{\"type\":\"urn:ietf:params:acme:error:dns\",\"detail\":\"There was a problem with a DNS query during identifier validation\"}}" size=333 status=200 user-agent="dehydrated/0.7.1 curl/8.0.1" user-id=

For debugging, I built my own step-ca binary with a minimal patch introducing some "print debugging":

--- step/acme/challenge.go  2024-01-16 22:57:25.186916318 +0100
+++ step-debug/acme/challenge.go    2024-01-16 22:58:18.507528606 +0100
@@ -307,7 +307,9 @@
    domain := strings.TrimPrefix(ch.Value, "*.")

    vc := MustClientFromContext(ctx)
+  fmt.Println("DNS lookup happening now")
    txtRecords, err := vc.LookupTxt("_acme-challenge." + domain)
+  fmt.Println(err)
    if err != nil {
        return storeError(ctx, db, ch, false, WrapError(ErrorDNSType, err,
            "error looking up TXT records for domain %s", domain))

This yielded a helpful error message, which was seemingly always there, hidden behind err.

Unfortunately I could not quite figure out what the storeError function is doing with err, but seemingly not printing it to the application output along with the HTTP responses.

Why is this needed?

To understand at one glance what the issue is (in my case, it was a malformatted --resolver option - with the generic "There was a problem" error message I wasted hours troubleshooting DNS, but after adding the print statement revealing the text behind err, the issue was immediately clear to me and solved within seconds).

hslatman commented 7 months ago

Hey @tacerus, thank you for reporting this. I'll take a look at this soon. In general we'd like the CA to be a bit more chatty about things, and this seems to be one of those cases. The error should already be stored with the ACME error, but that's not returned to the client by default, nor is it logged like that. I'll see if I can do at least the latter. Returning it to the client is an option too, but in general needs some more scrutiny to not leak sensitive details.