unixcharles / acme-client

A Ruby client for the letsencrypt's ACME protocol.
MIT License
494 stars 117 forks source link

First request fails if the directory is not fetched #235

Closed ggalmazor closed 7 months ago

ggalmazor commented 7 months ago

We have observed this issue with the update from 2.0.15 to 2.0.16

This code reproduces the issue and provides insight into the requirement to have the directory fetched for any other query to succeed:

require 'acme-client'

private_key = OpenSSL::PKey::RSA.new(<<~KEY)
  <REDACTED>
KEY
kid = "https://acme-staging-v02.api.letsencrypt.org/acme/acct/XYZ"
directory = "https://acme-staging-v02.api.letsencrypt.org/directory"

client1 = Acme::Client.new(private_key:, kid:, directory:)
# The first attempt to fetch the account fails with:
# JWS header parameter 'url' incorrect. Expected "https://acme-staging-v02.api.letsencrypt.org/acme/acct/XYZ" got "https://acme-staging-v02.api.letsencrypt.org/directory"
#
# You can reproduce by removing the inlined `rescue`
# However, it fetches the directory under the hood, which makes the second attempt work
client1.account rescue Acme::Client::Error::Malformed
client1.account

client2 = Acme::Client.new(private_key:, kid:, directory:)
# Manually fetching the directory makes the first attempt to fetch the account work
client2.directory
client2.account

This code targets the .account query, but we've seen the same behavior with multiple other calls.

We haven't been able to pinpoint the exact cause in the library's code, but it sounds like the .account call is triggering a call to fetch the directory, and the Faraday env for the original query is being reused for it, which messes up the JWK signature.

We are unsure how this behavior relates to the changes introduced in 2.0.16. However, the fact that in 2.0.15, fetching the directory would use a fresh Acme::Client::HTTPClient might be related and support the theory of the messed up Faraday env and wrong JWK signatures.

zarqman commented 7 months ago

I believe we're seeing the same errors here too.

I won't both to repeat much since @ggalmazor has already done a great job digging in above (thanks, @ggalmazor!). For added clarity, here are the exceptions being seen here:

Reverting to 2.0.15 resolved the issue (even on in-progress renewals that picked up from where they errored before, so it seems to be an in-memory thing, and not something with the original data).

unixcharles commented 7 months ago

Sorry about the inconvenience. I'm also unsure what could cause the problem.

I will have time to investigate later this week. Thanks for reporting.

unixcharles commented 7 months ago

Should be fixed in the 2.0.17 release. Let me know if you are still experiencing any issues.

ggalmazor commented 7 months ago

I just confirmed that the issue doesn't reproduce with 2.0.17 ✅

Thanks, @unixcharles!