mholt / acmez

Premier ACME client library for Go
https://pkg.go.dev/github.com/mholt/acmez/v2
Apache License 2.0
272 stars 32 forks source link

ACME 'invalid signature on JWS request' #18

Closed mbardelmeijer closed 1 year ago

mbardelmeijer commented 1 year ago

What version of the package are you using?

acmez v1.1.0 (bundled via certmagic v0.17.2)

What are you trying to do?

We're trying to obtain an SSL certificate on demand with certmagic via the ZeroSSL ACME issuer.

What steps did you take?

We've installed certmagic as documented and are running successfully for thousands of domains. A small percentage of our SSL requests trigger the error Obtain: [XXXX.com] creating new order: attempt 1: https://acme.zerossl.com/v2/DV90/newOrder: HTTP 400 urn:ietf:params:acme:error:malformed - Invalid signature on JWS request

What did you expect to happen, and what actually happened instead?

We expect that the SSL certificate would be issued.

How do you think this should be fixed?

I'm unsure what the best fix be. It might be a race condition, having multiple orders per domain.

Please link to any related issues, pull requests, and/or discussion

We use the S3 storage adapter for certmagic which does support locks via placing files in the bucket but isn't atomic.

Log output

{"level":"error","ts":1680599702.936518,"logger":"obtain","caller":"certmagic@v0.17.2/async.go:117","msg":"will retry","component":"acme","error":"[XXXX.com] Obtain: [XXXX.com] creating new order: attempt 1: https://acme.zerossl.com/v2/DV90/newOrder: HTTP 400 urn:ietf:params:acme:error:malformed - Invalid signature on
JWS request (ca=https://acme.zerossl.com/v2/DV90)","attempt":2,"retrying_in":120,"elapsed":66.086730116,"max_duration":2592000,"stacktrace":"github.com/caddyserver/certmagic.doWithRetry\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/async.go:117\ngithub.com/caddyserver/certmagic.(*Config).obtainCert\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:611\ngithub.com/caddyserver/certmagic.(*Config).ObtainCertAsync\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:462\ngithub.com/caddyserver/certmagic.(*Config).obtainOnDemandCertificate\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:447\ngithub.com/caddyserver/certmagic.(*Config).getCertDuringHandshake\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:314\ngithub.com/caddyserver/certmagic.(*Config).GetCertificate\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:77\ncrypto/tls.(*Config).getCertificate\n\t/usr/local/go/src/crypto/tls/common.go:1079\ncrypto/tls.(*serverHandshakeState).processClientHello\n\t/usr/local/go/src/crypto/tls/handshake_server.go:229\ncrypto/tls.(*serverHandshakeState).handshake\n\t/usr/local/go/src/crypto/tls/handshake_server.go:67\ncrypto/tls.(*Conn).serverHandshake\n\t/usr/local/go/src/crypto/tls/handshake_server.go:61\ncrypto/tls.(*Conn).handshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1491\ncrypto/tls.(*Conn).HandshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1434\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1877"}
mholt commented 1 year ago

Thanks for the report -- how can I reproduce the behavior?

mbardelmeijer commented 1 year ago

We have unfortunately not been able to reproduce it locally, I've only observed multiple cases in production via logging over a longer period. Hence the thought it might be locking related, but unsure how to best continue debugging this further.

mholt commented 1 year ago

Does it only happen with ZeroSSL? If it also happens with Let's Encrypt, I'd be very interested; but it's also possible it could be a server bug (they have very different stacks).

mbardelmeijer commented 1 year ago

We have only observed it with ZeroSSL at the moment. We use LetsEncrypt as our primary CA and ZeroSSL as the secondary, so occurrences only happen if we run into a rate limit with LetsEncrypt for that particular domain.

I'll keep an eye on it to see if we see it in our logs for LetsEncrypt as well!

mbardelmeijer commented 1 year ago

We've seen this in production logs for LetsEncrypt requests as well. It's not the invalid signature, but JWS verification error.

I'm still not sure what the cause here is; dropping the relevant error segment:

May 24 13:40:06 XX xx[3159879]: {"level":"error","ts":1684935606.4201875,"logger":"obtain","caller":"certmagic@v0.17.2/config.go:567","msg":"could not get certificate from issuer","component":"acme","identifier":"XXXX","issuer":"acme-v02.api.letsencrypt.org-directory","error":"HTTP 400 urn:ietf:params:acme:error:malformed - JWS verification error","stacktrace":"github.com/caddyserver/certmagic.(*Config).obtainCert.func2\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:567\ngithub.com/caddyserver/certmagic.doWithRetry\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/async.go:104\ngithub.com/caddyserver/certmagic.(*Config).obtainCert\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:611\ngithub.com/caddyserver/certmagic.(*Config).ObtainCertAsync\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/config.go:462\ngithub.com/caddyserver/certmagic.(*Config).obtainOnDemandCertificate\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:447\ngithub.com/caddyserver/certmagic.(*Config).getCertDuringHandshake\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:314\ngithub.com/caddyserver/certmagic.(*Config).GetCertificate\n\t/go/pkg/mod/github.com/caddyserver/certmagic@v0.17.2/handshake.go:77\ncrypto/tls.(*Config).getCertificate\n\t/usr/local/go/src/crypto/tls/common.go:1079\ncrypto/tls.(*serverHandshakeStateTLS13).pickCertificate\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:376\ncrypto/tls.(*serverHandshakeStateTLS13).handshake\n\t/usr/local/go/src/crypto/tls/handshake_server_tls13.go:58\ncrypto/tls.(*Conn).serverHandshake\n\t/usr/local/go/src/crypto/tls/handshake_server.go:53\ncrypto/tls.(*Conn).handshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1491\ncrypto/tls.(*Conn).HandshakeContext\n\t/usr/local/go/src/crypto/tls/conn.go:1434\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1877"}

Might be relevant: https://github.com/caddyserver/caddy/issues/3903 https://caddy.community/t/invalid-signature-on-jws-request-http-400-zerossl/14634

mholt commented 1 year ago

Do these errors only occur for renewals?

Do the certificates being obtained have anything in common (with regard to their subjects, private keys, or anything else, like original issuance date / config)?

We use the S3 storage adapter for certmagic which does support locks via placing files in the bucket but isn't atomic.

I would not be surprised if this is part of the problem. Does the problem still happen if you use a more correct storage backend like a file system or SQL DB (or even redis)?

mholt commented 1 year ago

I wonder if CertMagic is using the wrong private key for some reason? :thinking:

Due to inactivity -- and, upon thinking, this might be a bug in CertMagic if it is a bug, probably not acmez (even though, you are right that acmez does the request that generates this, it's just that CertMagic might be giving it bad input) -- I'll close this issue for now. Let me know if we need to reopen!

mbardelmeijer commented 1 year ago

Sorry for the delayed response there. I've been working on replacing the storage driver with a Redis locking layer but it's a larger undertaking than expected, hence the delay.

We do indeed use CertMagic. We're in the process of updating to 0.18.2 from 0.17.2; if the issue still persists after that, I'll try to aggregate as much info as possible and open a new issue on the certmagic repo :)

mholt commented 1 year ago

Thanks, that'll be helpful! I would love to narrow that down. :)