python / cpython

The Python programming language
https://www.python.org
Other
62.44k stars 29.97k forks source link

email module generates wrong MIME header with quoted-printable encoded extra space with Python 3.12.4 #120930

Closed jun66j5 closed 2 months ago

jun66j5 commented 3 months ago

Bug report

Bug description:

In Python 3.12.4, using EmailMessage class with long non-ASCII characters in the subject, the resulting, encoded extra space =?utf-8?q?_?= is generated. The issue doesn't occur with Python 3.12.3 and 3.11.9.

Python 3.12.4

Python 3.12.4 (main, Jun 20 2024, 23:12:11) [GCC 13.2.1 20240309] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import message_from_string
>>> from email.message import EmailMessage
>>> from email.header import decode_header, make_header
>>> msg = EmailMessage()
>>> msg.set_content(u'Body text.\n', cte='quoted-printable')
>>> subject = 'A_very' + ' long' * 23 + ' súmmäry'
>>> subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmmäry'
>>> msg['Subject'] = subject
>>> print(msg.as_string())
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: A_very long long long long long long long long long long long long
 long long long long long long long long long long long =?utf-8?q?s=C3=BAmm?=
 =?utf-8?q?_?==?utf-8?q?=C3=A4ry?=

Body text.

>>> parsed_msg = message_from_string(msg.as_string())
>>> parsed_subject = str(make_header(decode_header(parsed_msg['Subject'])))
>>> parsed_subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmm äry'
>>> subject == parsed_subject
False
>>>

Python 3.12.3

Python 3.12.3 (main, May 23 2024, 00:56:56) [GCC 13.2.1 20240309] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import message_from_string
>>> from email.message import EmailMessage
>>> from email.header import decode_header, make_header
>>> msg = EmailMessage()
>>> msg.set_content(u'Body text.\n', cte='quoted-printable')
>>> subject = 'A_very' + ' long' * 23 + ' súmmäry'
>>> subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmmäry'
>>> msg['Subject'] = subject
>>> print(msg.as_string())
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: A_very long long long long long long long long long long long long
 long long long long long long long long long long long =?utf-8?q?s=C3=BAmm?=
 =?utf-8?q?=C3=A4ry?=

Body text.

>>> parsed_msg = message_from_string(msg.as_string())
>>> parsed_subject = str(make_header(decode_header(parsed_msg['Subject'])))
>>> parsed_subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmmäry'
>>> subject == parsed_subject
True
>>>

CPython versions tested on:

3.12

Operating systems tested on:

No response

Linked PRs

dtrodrigues commented 2 months ago

Bisected to https://github.com/python/cpython/commit/ffe9ba04778f852a14f2404b5fcf13cb3ba1bf45 / https://github.com/python/cpython/issues/92081