python / cpython

The Python programming language
https://www.python.org
Other
62.46k stars 29.98k forks source link

EmailMessage bad encoding for non-ASCII localpart #122476

Open medmunds opened 1 month ago

medmunds commented 1 month ago

Bug report

Bug description:

The modern email package incorrectly encodes a non-ASCII email address 'local-part' (username) using an RFC 2047 encoded-word, resulting in undeliverable email:

>>> from email.message import EmailMessage
>>> msg = EmailMessage()
>>> msg["To"] = "jörg@example.com"
>>> msg.as_bytes()
b'To: =?utf-8?q?j=C3=B6rg?=@example.com\n\n'

That use is prohibited by RFC 2047 section 5:

An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.

This issue is related to #83938, which covers the same problem in an email address domain. In a comment there, @bitdancer says:

What should be happening here is that an error should be raised when that header is set (or possibly when it is accessed/serialized, but when set would be better I think) saying that there is non-ascii in the domain part.

Since email.policy.SMTPUTF8 allows non-ASCII addr-specs, I think the error will have to be when the header is serialized with a utf8=False policy (and not when the header is set):

# Expected behavior (using `msg` from example above):
>>> msg.as_bytes()
ValueError('Non-ASCII username requires SMTPUTF8 policy')

# This already works correctly:
>>> from email.policy import SMTPUTF8
msg.as_bytes(policy=SMTPUTF8)
b'To: j\xc3\xb6rg@example.com\r\n\r\n'

More info:

CPython versions tested on:

3.8, 3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux, macOS

medmunds commented 1 month ago

Also related: https://github.com/python/cpython/issues/81074#issuecomment-1093823543