letsencrypt / boulder

An ACME-based certificate authority, written in Go.
Mozilla Public License 2.0
5.22k stars 608 forks source link

CLI tool to update registration contact #6864

Open jprenken opened 1 year ago

jprenken commented 1 year ago

CAs occasionally get GDPR/CCPA/similar requests to delete personal data, including email addresses. For handling these requests, it would be nice to have a CLI tool to modify a registration's contact field, instead of manually UPDATEing the database.

The tool would ideally allow specifying one or more regIDs, a current contact field, or both. For the second case especially, it will need various guardrails to ensure that only the intended registrations will be modified.

Another note: the registrations table is not indexed on contact; a full table scan can take 30 minutes or more.

jsha commented 1 year ago

This is a good idea. Here's my plan: rather than defining a new Boulder CLI tool, I'd like to make boulder admin an alias for boulder admin-revoker. Currently all the sub-sub-commands of admin-revoker are revocation related, but since this is not revocation related it makes sense to migrate to calling the tool boulder admin. We'd define a new sub-sub-command delete-email-address.

To clarify the use case: If you specify only registration IDs, you'd like all contacts associated with those registration IDs removed? If you specify registration IDs plus an email address, you'd like only those email addresses associated with those registration IDs removed? What would you like to happen if you specify only an email address? Full table scan followed by removal from the registration IDs found? You mention guardrails to ensure that only intended registrations will be modified. But if we know a set of intended registrations, wouldn't you specify both those registrations and the email address?

aarongable commented 1 year ago

The new subcommands should document that they don't support deduplicating foo+bar@gmail.com and "f.o.o@gmail.com` etc. Supporting all possible equivalent variations of an email seems like more than is necessary here.

jprenken commented 1 year ago

To clarify the use case: If you specify only registration IDs, you'd like all contacts associated with those registration IDs removed?

Yes; I think that will be a rare use case.

If you specify registration IDs plus an email address, you'd like only those email addresses associated with those registration IDs removed?

Yes: Only the email address specified should be removed, and only from the registration IDs specified. This probably deserves a WHERE id IN (x, y, z) AND contact LIKE [what we located and want to replace] so that we avoid race conditions: don't pave over contact changes that have snuck in during our runtime.

What would you like to happen if you specify only an email address? Full table scan followed by removal from the registration IDs found?

Exactly, and as in the previous case, only the email address specified should be removed. This will be the most common use case. To keep addressing race condition (and primary DB load, and potentially locking) risks, I think this should SELECT for regIDs and then UPDATE as two separate operations.

You mention guardrails to ensure that only intended registrations will be modified. But if we know a set of intended registrations, wouldn't you specify both those registrations and the email address?

The additional guardrails I can think of offhand are maybe a default (changeable by arg?) LIMIT clause, careful input sanitization as always, and log output to ensure a bad change can be reverted. It really helps that SQL = isn't going to unexpectedly glob or pattern-match.

jsha commented 1 year ago

One other thing to specify: the email matching should be done case-insensitively, since the domain part is specified to be case insensitive, and the local part is in practice case insensitive for many providers.

aarongable commented 4 months ago

Looking through bugs related to the admin tool and account emails: was the work on this bug completed by https://github.com/letsencrypt/boulder/pull/6919? if not, is the remaining work on this bug sufficiently tracked by https://github.com/letsencrypt/boulder/issues/7355?