hasgeek / lastuser

Lastuser has been merged into Funnel. This repository is archived.
https://hasgeek.com/
BSD 2-Clause "Simplified" License
166 stars 30 forks source link

Email addresses are case sensitive #221

Open jace opened 7 years ago

jace commented 7 years ago

In #215 we enforced a lowercase index for email addresses. This is pragmatic as it is extremely unlikely that a given email domain will have the same account with different casings. Most providers prohibit this.

However, email addresses are case sensitive per RFC 5321:

The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged. Mailbox domains follow normal DNS rules and are hence not case sensitive.

A lower-cased email address may not actually be reachable. We should:

  1. Enforce the lowercase unique index, but
  2. Preserve case in email addresses everywhere (UserEmail, UserEmailClaim and other apps like Hasjob and Boxoffice).

Rather than a SQL index on LOWER(email), we should have a normalised_email column on the UserEmail model and place the unique index on that. By moving normalisation into the app, we handle special cases:

  1. Removing periods in @gmail.com addresses, as Gmail disregards them.
  2. Removing + suffixes as those are the same mailbox (optional).

Pending issues:

  1. Does Gmail's period-ignoring behaviour apply to all G Suite domains? (Update: it doesn't)
  2. Gravatar requires the MD5sum to be of the lowercase email. This is the primary use of MD5sum (once #165 is resolved). What's the data source for calculating MD5? If we take normalised_email, we're also losing periods in gmail.com addresses, which is not what Gravatar is expecting.
jace commented 7 years ago

Possible solution to the second problem with Gravatar, etc: store two normalised emails:

  1. Lowercase normalised version, which has a unique constraint (in UserEmail only). This is the reference for queries. It could be the existing LOWER(email) index instead of a distinct column.

  2. Application normalised version in which:

    1. + suffixes are removed,
    2. @googlemail.com is replaced with @gmail.com (ref), and
    3. periods are stripped from @gmail.com addresses

The application normalised version is used for discovery of a re-used email address, but uniqueness is not enforced, since there are legitimate reasons for users to re-use addresses. It may be used for relevant security checks if, for example, one version is used as a user's email address and another as a organisation's or team's, to warn the user that shared access to the email address may compromise their own account.

jace commented 7 years ago

The logic for application normalised emails could be in the mxsniff library, although that one currently involves a DNS lookup. mxsniff's provider list could be modified to include the primary domain instead of the MX target, and a custom normalisation function.