letsencrypt / boulder

An ACME-based certificate authority, written in Go.
Mozilla Public License 2.0
5.15k stars 601 forks source link

Improve retrieval efficiency for pending and valid authzs #5764

Open jsha opened 2 years ago

jsha commented 2 years ago

When you create a new order, Boulder first checks the authz2 table to see if there are any pending or valid authzs for the names you asked for, so it can reuse them in the order. Right now this relies on an index (registrationID,identifierType,identifierValue,status,expires). But when the index grows quite large, particularly for accounts with lots of identifiers or lots of authzs, even an index lookup can get expensive.

We can do better: we can have a mapping in some datastore (DB or Redis) from (regID, identifierValue, identifierType) to at most one authz ID. That authz ID would be the "best" available at any given time - a valid authz if one is available, or a pending otherwise, or nothing if there is neither a pending or valid authz. We would update this value when a pending authz is created and when an authz transitions to invalid, valid, or deactivated. We would also need to update it when the authz expires. We could do this reactively at query time - if we find an entry here, look up the authz ID and find it is expired, we would delete the entry. Alternately, we could choose to not delete on expiry. The intuition is that any time we find an expired authz, the very next thing we're likely to do is create a new authz, which will immediately overwrite the existing entry.

We would expire entries out of this datastore with TTLs or partitioning.

aarongable commented 2 years ago

While we obviously want to keep the identifierType around in the main database, I think this quick-lookup "authz cache" could leave it behind: DNSNames cannot look like IPAddresses because their TLD can't start with a digit. So simply storing the identifier should still give sufficient uniqueness.