zulip / zulip

Zulip server and web application. Open-source team chat that helps teams stay productive and focused.
https://zulip.com
Apache License 2.0
21.59k stars 7.84k forks source link

Truncate or prevent overly-long sender addresses, for SES compatibility. #17558

Closed alexmv closed 3 years ago

alexmv commented 3 years ago

Amazon SES has an (undocumented) limit on the size of address fields, and reject email with too-long From addresses, as follows:

Transaction failed: Address length is more than 320 bytes long: '=?utf-8?b?Zcy3zYLMvc2SzIrNjM2SzL3Ni8yIzYbNi82CzZ3Nhc2NzKfNmcygzLnMoMyuzK0=?= =?utf-8?b?zZPMocyizYfNiM2czKHMncykzJ94zLjMiMyFzJrNjM2dzL7Nis2SzILNis2gzYE=?= =?utf-8?b?zIfNhM2RzZjNksyBzZvNhsyPzIvMu8yezKjMp8yqzJ7Muc2czLNhzLfNisyIzL0=?= =?utf-8?b?zJXNoMyMzJrNkc2EzZ3Mi8yNzYDMjMy9zL7Mi82KzYrMh82DzZ3MjMyNzJLMn8yg?= =?utf-8?b?zLnMqsyjbcy0zZ3NisyUzJXMkMyZzLDMmcyizYXMusyizKDMrcyozY7Mrs2TzYc=?= =?utf-8?b?zK/MosyXzLvNhcyucMy3zIXMk82KzJrNg82YzJvNkcyIzZjNjM2MzL7Mvs2IzZk=?= =?utf-8?b?zKvMq8yezKTNnMyozYnMrcyvzKLMr8yWzJ3MucydzJ3MscyvzJbMoMy5zZRszLg=?= =?utf-8?b?zILMgcyPzIvMjM2AzIbMkMyCzYHMkMyEzI/Nks2YzZDMiMyIzIjMucywzKbMqs2V?= =?utf-8?b?zJ/MssyizLDNnMymzKvMs82UzK/Mu2XMuMyDzZDNhs2DzJvNm8yDzInMiM2bzJM=?= =?utf-8?b?zJXMv8yFzIXMkcyczJw=?= <example@example.com>'

This example header is the result of "zalgo" text, and, when formatted, looks like:

From: ȩ̷̡̢̡͍͙̠̹̠̮̭͓͇͈̝̤̟͂̽͒̊͌͒̽͋̈͆͋͂͜͝ͅẍ̸̨̧̻̞̪̞̹̳̅͌̾͊͒̂͊́̇̈́͑͒́͛͆̏̋̚͘͜͝͠a̷̟̠̹̪̣͊̈̽̌͑̈́̋̍̀̌̽̾̋͊͊̇̓̌̍̒̕̚͠͝͝m̴̢̢̨̢̙̰̙̺̠̭͎̮͓͇̯̗̻̮͊̔̐̕͝ͅͅp̷̨̢̛͈͙̫̫̞̤͉̭̯̯̖̝̹̝̝̱̯̖̠̹͔̅̓͊̓͑̈͌͌̾̾̚͘͘͜l̸̢̹̰̦̪͕̟̲̰̦̫̳͔̯̻̂́̏̋̌̀̆̐̂́̐̄̏͒͐̈̈̈͘͜ẽ̸̛̜̜͐͆̓͛̃̉̈͛̓̿̅̅̑̕ <example@example.com>

This is bad because it results in dropped email notifications, and the users have no feedback as to why they are no longer receiving email from Zulip.

We have a couple options:

  1. Attempt to truncate the name before sending, such that the RFC 2047-encoded address is under 320 bytes. This is potentially complicated, as we need to worry about not splitting multibyte characters mid-character, as well as dealing with combining characters; this might be a possible starting point.
  2. Prevent such long names to begin with, at the UI level. This may be a confusing requirement to explain to the user, since the "length" limit is in the encoded form, which is non-trivial to understand for non-technical audiences. It also would not address names that come in via e.g. LDAP import.

None of this would address with folks whose email address is A <aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@example.com>, however, since there's no way to truncate that.

zulipbot commented 3 years ago

Hello @zulip/server-development, @zulip/server-production members, this issue was labeled with the "area: emails", "area: production" labels, so you may want to check it out!

gnprice commented 3 years ago

I think regardless of whether we try to prevent these long names in the UI, we'll want to truncate them when necessary at send time -- there will be names that bypass the UI via LDAP, or Slack import, or predating that change (or else we have to have a migration to truncate them), etc.

Once we're truncating, it's inevitably going to be a bit crude, and I think anything that causes the mail to go through is still an improvement. (In the extreme case: dropping the name entirely and just keeping the address itself would be a major improvement over not sending the mail.) Truncating it at a codepoint boundary is probably good enough not to be worth trying to do anything fancier.

(In fact, in that example, it probably actually comes out best if we don't try to treat "base character plus all its combining characters" as a unit! Better to have the last letter missing some of its diacritics than to have the letter itself missing.)

timabbott commented 3 years ago

It might actually generally be the case that for this corner case, it'd be better to just drop the name entirely; I don't think it's likely to be common for names to be too long, and it's not important to have the name in the To lines -- so we're more likely to upset someone by truncating their name badly than by just not including it.

@mateuszmandera @hackerkid FYI.

alexmv commented 3 years ago

I think that's a good solution -- nobody's going to notice if we're sending to <example@example.com>. Also very much easier to implement and be confident of. :)

mateuszmandera commented 3 years ago

So the idea would be to re-send the email with the name dropped, if the first sending attempt gets rejected like above?

alexmv commented 3 years ago

I was envisioning that when constructing the email, we would switch to email-only if the full name was longer than 320 characters. But we could also try sending both ways -- I think the difficulty is that there are a large number of reasons that the SMTP send could fail, and no consistency to error codes, and hardcoding an SES error message to check against seems fraught.

Oh, and apparently 320 bytes is the max length allowed per RFC. So making sure we stay within that seems reasonable. :)

alexmv commented 3 years ago

Somewhat relatedly, these sorts of failures show up as mail.send() raising SMTPDataError; we should ideally catch that and re-raise it as EmailNotDeliveredException, instead of propagating up the SMTPDataError.

pletinckxc commented 3 years ago

@zulipbot claim

zulipbot commented 3 years ago

Hello @pletinckxc, it looks like you've currently claimed 1 issue in this repository. We encourage new contributors to focus their efforts on at most 1 issue at a time, so please complete your work on your other claimed issues before trying to claim this issue again.

We look forward to your valuable contributions!