humanmade / aws-ses-wp-mail

An AWS SES wp_mail() drop-in
184 stars 51 forks source link

Encode Unicode site names before sending #46

Closed rmccue closed 3 years ago

rmccue commented 4 years ago

In this dropin, we use the site's name in the sender field. For non-English sites, this name may use non-ASCII characters in the Unicode space, however email/MIME is an ASCII-based protocol, so these need to be encoded.

While some mailers handle this encoding automatically, the AWS SendEmail endpoint specifically notes:

The sender name (also known as the friendly name) may contain non-ASCII characters. These characters must be encoded using MIME encoded-word syntax, as described in RFC 2047. MIME encoded-word syntax uses the following form: =?charset?encoding?encoded-text?=.

Attempting to send email right now with these non-ASCII characters causes email to be sent with invalid Unicode characters in some cases, which can also trigger anti-spam functionality.

For example, with a site name of Blog 한국어, this will get mangled to Blog m� if not encoded. We need to encode this to Blog =?UTF-8?B?7ZWc6rWt7Ja0?= before sending, which will then be correctly decoded by clients.

PHP helpfully provides the mb_encode_mimeheader as part of the mbstring extension which will handle this automatically for us, but would mean we rely on mbstring. Unsure if that's an issue or not.

roborourke commented 3 years ago

Tested and created this PR upstream here https://github.com/humanmade/aws-ses-wp-mail/pull/47 - will make a patch release once it's gone through review