Closed mbardelmeijer closed 1 year ago
Thanks for the report -- how can I reproduce the behavior?
We have unfortunately not been able to reproduce it locally, I've only observed multiple cases in production via logging over a longer period. Hence the thought it might be locking related, but unsure how to best continue debugging this further.
Does it only happen with ZeroSSL? If it also happens with Let's Encrypt, I'd be very interested; but it's also possible it could be a server bug (they have very different stacks).
We have only observed it with ZeroSSL at the moment. We use LetsEncrypt as our primary CA and ZeroSSL as the secondary, so occurrences only happen if we run into a rate limit with LetsEncrypt for that particular domain.
I'll keep an eye on it to see if we see it in our logs for LetsEncrypt as well!
We've seen this in production logs for LetsEncrypt requests as well. It's not the invalid signature
, but JWS verification error
.
I'm still not sure what the cause here is; dropping the relevant error segment:
May 24 13:40:06 XX xx[3159879]: {"level":"error","ts":1684935606.4201875,"logger":"obtain","caller":"certmagic@v0.17.2/config.go:567","msg":"could not get certificate from issuer","component":"acme","identifier":"XXXX","issuer":"acme-v02.api.letsencrypt.org-directory","error":"HTTP 400 urn:ietf:params:acme:error:malformed - JWS verification error","stacktrace":"github.com/caddyserver/certmagic.(*Config).obtainCert.func2\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:567\ngithub.com/caddyserver/certmagic.doWithRetry\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/async.go:104\ngithub.com/caddyserver/certmagic.(*Config).obtainCert\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:611\ngithub.com/caddyserver/certmagic.(*Config).ObtainCertAsync\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:462\ngithub.com/caddyserver/certmagic.(*Config).obtainOnDemandCertificate\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:447\ngithub.com/caddyserver/certmagic.(*Config).getCertDuringHandshake\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:314\ngithub.com/caddyserver/certmagic.(*Config).GetCertificate\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:77\ncrypto/tls.(*Config).getCertificate\n\t/usr/local/go/src/crypto/tls/common.go:1079\ncrypto/tls.(*serverHandshakeStateTLS13).pickCertificate\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:376\ncrypto/tls.(*serverHandshakeStateTLS13).handshake\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:58\ncrypto/tls.(*Conn).serverHandshake\n\t/usr/local/go/src/crypto/tls/handshake_server.go:53\ncrypto/tls.(*Conn).handshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1491\ncrypto/tls.(*Conn).HandshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1434\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1877"}
Might be relevant: https://github.com/caddyserver/caddy/issues/3903 https://caddy.community/t/invalid-signature-on-jws-request-http-400-zerossl/14634
Do these errors only occur for renewals?
Do the certificates being obtained have anything in common (with regard to their subjects, private keys, or anything else, like original issuance date / config)?
We use the S3 storage adapter for certmagic which does support locks via placing files in the bucket but isn't atomic.
I would not be surprised if this is part of the problem. Does the problem still happen if you use a more correct storage backend like a file system or SQL DB (or even redis)?
I wonder if CertMagic is using the wrong private key for some reason? :thinking:
Due to inactivity -- and, upon thinking, this might be a bug in CertMagic if it is a bug, probably not acmez (even though, you are right that acmez does the request that generates this, it's just that CertMagic might be giving it bad input) -- I'll close this issue for now. Let me know if we need to reopen!
Sorry for the delayed response there. I've been working on replacing the storage driver with a Redis locking layer but it's a larger undertaking than expected, hence the delay.
We do indeed use CertMagic. We're in the process of updating to 0.18.2 from 0.17.2; if the issue still persists after that, I'll try to aggregate as much info as possible and open a new issue on the certmagic repo :)
Thanks, that'll be helpful! I would love to narrow that down. :)
What version of the package are you using?
acmez v1.1.0 (bundled via certmagic v0.17.2)
What are you trying to do?
We're trying to obtain an SSL certificate on demand with certmagic via the ZeroSSL ACME issuer.
What steps did you take?
We've installed certmagic as documented and are running successfully for thousands of domains. A small percentage of our SSL requests trigger the error
Obtain: [XXXX.com] creating new order: attempt 1: https://acme.zerossl.com/v2/DV90/newOrder: HTTP 400 urn:ietf:params:acme:error:malformed - Invalid signature on JWS request
What did you expect to happen, and what actually happened instead?
We expect that the SSL certificate would be issued.
How do you think this should be fixed?
I'm unsure what the best fix be. It might be a race condition, having multiple orders per domain.
Please link to any related issues, pull requests, and/or discussion
We use the S3 storage adapter for certmagic which does support locks via placing files in the bucket but isn't atomic.
Log output