Open jprenken opened 1 year ago
This is a good idea. Here's my plan: rather than defining a new Boulder CLI tool, I'd like to make boulder admin
an alias for boulder admin-revoker
. Currently all the sub-sub-commands of admin-revoker are revocation related, but since this is not revocation related it makes sense to migrate to calling the tool boulder admin
. We'd define a new sub-sub-command delete-email-address
.
To clarify the use case: If you specify only registration IDs, you'd like all contacts associated with those registration IDs removed? If you specify registration IDs plus an email address, you'd like only those email addresses associated with those registration IDs removed? What would you like to happen if you specify only an email address? Full table scan followed by removal from the registration IDs found? You mention guardrails to ensure that only intended registrations will be modified. But if we know a set of intended registrations, wouldn't you specify both those registrations and the email address?
The new subcommands should document that they don't support deduplicating foo+bar@gmail.com
and "f.o.o@gmail.com` etc. Supporting all possible equivalent variations of an email seems like more than is necessary here.
To clarify the use case: If you specify only registration IDs, you'd like all contacts associated with those registration IDs removed?
Yes; I think that will be a rare use case.
If you specify registration IDs plus an email address, you'd like only those email addresses associated with those registration IDs removed?
Yes: Only the email address specified should be removed, and only from the registration IDs specified. This probably deserves a WHERE id IN (x, y, z) AND contact LIKE [what we located and want to replace]
so that we avoid race conditions: don't pave over contact changes that have snuck in during our runtime.
What would you like to happen if you specify only an email address? Full table scan followed by removal from the registration IDs found?
Exactly, and as in the previous case, only the email address specified should be removed. This will be the most common use case. To keep addressing race condition (and primary DB load, and potentially locking) risks, I think this should SELECT
for regIDs and then UPDATE
as two separate operations.
You mention guardrails to ensure that only intended registrations will be modified. But if we know a set of intended registrations, wouldn't you specify both those registrations and the email address?
The additional guardrails I can think of offhand are maybe a default (changeable by arg?) LIMIT
clause, careful input sanitization as always, and log output to ensure a bad change can be reverted. It really helps that SQL =
isn't going to unexpectedly glob or pattern-match.
One other thing to specify: the email matching should be done case-insensitively, since the domain part is specified to be case insensitive, and the local part is in practice case insensitive for many providers.
Looking through bugs related to the admin tool and account emails: was the work on this bug completed by https://github.com/letsencrypt/boulder/pull/6919? if not, is the remaining work on this bug sufficiently tracked by https://github.com/letsencrypt/boulder/issues/7355?
CAs occasionally get GDPR/CCPA/similar requests to delete personal data, including email addresses. For handling these requests, it would be nice to have a CLI tool to modify a registration's contact field, instead of manually
UPDATE
ing the database.The tool would ideally allow specifying one or more regIDs, a current contact field, or both. For the second case especially, it will need various guardrails to ensure that only the intended registrations will be modified.
Another note: the
registrations
table is not indexed oncontact
; a full table scan can take 30 minutes or more.