In Python 3.12.4, using EmailMessage class with long non-ASCII characters in the subject, the resulting, encoded extra space =?utf-8?q?_?= is generated. The issue doesn't occur with Python 3.12.3 and 3.11.9.
Python 3.12.4
Python 3.12.4 (main, Jun 20 2024, 23:12:11) [GCC 13.2.1 20240309] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import message_from_string
>>> from email.message import EmailMessage
>>> from email.header import decode_header, make_header
>>> msg = EmailMessage()
>>> msg.set_content(u'Body text.\n', cte='quoted-printable')
>>> subject = 'A_very' + ' long' * 23 + ' súmmäry'
>>> subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmmäry'
>>> msg['Subject'] = subject
>>> print(msg.as_string())
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: A_very long long long long long long long long long long long long
long long long long long long long long long long long =?utf-8?q?s=C3=BAmm?=
=?utf-8?q?_?==?utf-8?q?=C3=A4ry?=
Body text.
>>> parsed_msg = message_from_string(msg.as_string())
>>> parsed_subject = str(make_header(decode_header(parsed_msg['Subject'])))
>>> parsed_subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmm äry'
>>> subject == parsed_subject
False
>>>
Python 3.12.3
Python 3.12.3 (main, May 23 2024, 00:56:56) [GCC 13.2.1 20240309] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import message_from_string
>>> from email.message import EmailMessage
>>> from email.header import decode_header, make_header
>>> msg = EmailMessage()
>>> msg.set_content(u'Body text.\n', cte='quoted-printable')
>>> subject = 'A_very' + ' long' * 23 + ' súmmäry'
>>> subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmmäry'
>>> msg['Subject'] = subject
>>> print(msg.as_string())
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: A_very long long long long long long long long long long long long
long long long long long long long long long long long =?utf-8?q?s=C3=BAmm?=
=?utf-8?q?=C3=A4ry?=
Body text.
>>> parsed_msg = message_from_string(msg.as_string())
>>> parsed_subject = str(make_header(decode_header(parsed_msg['Subject'])))
>>> parsed_subject
'A_very long long long long long long long long long long long long long long long long long long long long long long long súmmäry'
>>> subject == parsed_subject
True
>>>
Bug report
Bug description:
In Python 3.12.4, using
EmailMessage
class with long non-ASCII characters in the subject, the resulting, encoded extra space=?utf-8?q?_?=
is generated. The issue doesn't occur with Python 3.12.3 and 3.11.9.Python 3.12.4
Python 3.12.3
CPython versions tested on:
3.12
Operating systems tested on:
No response
Linked PRs