python / cpython

The Python programming language
https://www.python.org
Other
62.17k stars 29.88k forks source link

email parser mis-handling of character encoding in the MIME preamble #107575

Open jasen-b opened 1 year ago

jasen-b commented 1 year ago

A clear and concise description of the bug

email.message_from_bytes in some circumstances is not reversible by email::as_bytes

eg:

import email

e=email.message_from_bytes(b"""Content-Type: multipart/alternative; boundary="CUT-HERE"\r\n\
Date: Thu, 27 Jul 2023 02:01:09 +0000\r\n\
From: "example" <ticket@example.com>\r\n\
Message-Id: <xxxxx@example.com>\r\n\
Mime-Version: 1.0\r\r\
Subject: breaking python without ink\r\n\
To: <bar@foo.example.org>\r\n\
\r\n\
No-break\xc2\xa0space!\r\n\
""" )

print(e.as_bytes());

adding a charset parameter to the mime-type header can cause the breakage to take different forms. which is strange because charset is only defined for text/* content (not for multipart/*) and the offending content is not MIME content anyway, it is preamble content.

removing the Mime-Type: header prevents this bug from manifesting,

The following monkey-patch prevents the exception, but the result is still wrong in different ways depending on the setting of charset (which should in theory have no effect)

import email.generator
def _my_generator_write(self, s):
    self._fp.write(s.encode('utf-8', errors='surrogateescape'))

email.generator.BytesGenerator.write=_my_generator_write

This suggests to me that email.parser is doing something wrong.

Your environment

Debian linux version 10 to 12, AMD64 python 3.7 to 3.11.2

fsc-eriker commented 7 months ago

You are not allowed to put anything except ASCII in the preamble anyway.