email.generator.BytesGenerator corrupts data by changing line endings

d2270235-76ab-4b9c-9c1e-7baf369431cd commented 11 years ago

BPO	19003
Nosy	@warsaw, @bitdancer, @xZise, @jayvdb
Files	issue19003_email.patch: patch for lib/email

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'expert-email'] title = 'email.generator.BytesGenerator corrupts data by changing line endings' updated_at = user = 'https://bugs.python.org/AlexanderKruppa' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = ['email'] creation = creator = 'Alexander.Kruppa' dependencies = [] files = ['35799'] hgrepos = [] issue_num = 19003 keywords = ['patch'] message_count = 6.0 messages = ['197476', '221827', '221828', '228496', '275857', '275859'] nosy_count = 7.0 nosy_names = ['barry', 'r.david.murray', 'python-dev', 'Alexander.Kruppa', '\xe5\xa4\xa9\xe4\xb8\x80.\xe4\xbd\x95', 'xZise', 'jayvdb'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue19003' versions = ['Python 3.5', 'Python 3.6'] ```

d2270235-76ab-4b9c-9c1e-7baf369431cd commented 11 years ago

This is a follow-up to bpo-16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption.

Repost of the update I posted in bpo-16564:

\~/build/Python-3.3.2$ ./python --version Python 3.3.2

When modifying the test case in Lib/test/test_email/test_email.py like this:

--- Lib/test/test_email/test_email.py   2013-05-15 18:32:55.000000000 +0200
+++ Lib/test/test_email/test_email_mine.py  2013-09-10 14:22:08.160089440 +0200
@@ -1461,17 +1461,17 @@
         # Issue 16564: This does not produce an RFC valid message, since to be
         # valid it should have a CTE of binary.  But the below works in
         # Python2, and is documented as working this way.
-        bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
+        bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
         msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop)
         # Treated as a string, this will be invalid code points.
-        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
         self.assertEqual(msg.get_payload(decode=True), bytesdata)
         s = BytesIO()
         g = BytesGenerator(s)
         g.flatten(msg)
         wireform = s.getvalue()
         msg2 = email.message_from_bytes(wireform)
-        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
         self.assertEqual(msg2.get_payload(decode=True), bytesdata)

then running:

./python ./Tools/scripts/run_tests.py test_email

results in:

\====================================================================== FAIL: test_binary_body_with_encode_noop (test_email_mine.TestMIMEApplication) ----------------------------------------------------------------------

Traceback (most recent call last):
  File "/localdisk/kruppaal/build/Python-3.3.2/Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop
    self.assertEqual(msg2.get_payload(decode=True), bytesdata)
AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'

The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted.

Encoding the bytes array: bytes(range(256))

results output data (MIME Header stripped):

0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a ................ 0000010: 0e0f 1011 1213 1415 1617 1819 1a1b 1c0a ................ 0000020: 1d0a 1e0a 1f20 2122 2324 2526 2728 292a ..... !"#$%&'()* 0000030: 2b2c 2d2e 2f30 3132 3334 3536 3738 393a +,-./0123456789: 0000040: 3b3c 3d3e 3f40 4142 4344 4546 4748 494a ;\<=>?@ABCDEFGHIJ 0000050: 4b4c 4d4e 4f50 5152 5354 5556 5758 595a KLMNOPQRSTUVWXYZ 0000060: 5b5c 5d5e 5f60 6162 6364 6566 6768 696a [\]^_`abcdefghij 0000070: 6b6c 6d6e 6f70 7172 7374 7576 7778 797a klmnopqrstuvwxyz 0000080: 7b7c 7d7e 7f80 8182 8384 8586 8788 898a {|}~............ 0000090: 8b8c 8d8e 8f90 9192 9394 9596 9798 999a ................ 00000a0: 9b9c 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa ................ 00000b0: abac adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba ................ 00000c0: bbbc bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca ................ 00000d0: cbcc cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da ................ 00000e0: dbdc ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ................ 00000f0: ebec edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa ................ 0000100: fbfc fdfe ff .....

That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e', and '\x0d' is replaced by '\n\n'.

I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings.

ghost commented 10 years ago

Confirmed in Python 3.4.1.

ghost commented 10 years ago

This patch added special behavior with MIMEApplication and may fix this issue. Can be verified with test_email.

fe457fde-f237-4989-bea1-4966464a7c38 commented 10 years ago

I can confirm this on 3.4.1 and is really annoying. But the patch should set '_is_raw_payload' to False if the payload is set via 'set_payload' (the operations in 'set_raw_payload' need to be switched).

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 8 years ago

New changeset c0f5702e0f10 by R David Murray in branch '3.5': bpo-19003: Only replace \r and/or \n line endings in email.generator. https://hg.python.org/cpython/rev/c0f5702e0f10

New changeset ccad4d142934 by R David Murray in branch 'default': Merge: bpo-19003: Only replace \r and/or \n line endings in email.generator. https://hg.python.org/cpython/rev/ccad4d142934

bitdancer commented 8 years ago

I've fixed this to the extent that it is possible without adding support for the 'binary' CTE. That is, \r, \n, and \r\n are still replaced with the 'correct' line ending characters, which is the correct behavior under the RFCs even for binary data if the CTE is not 'binary'. bpo-18886 covers the enhancement of supporting the binary CTE.

python / cpython

email.generator.BytesGenerator corrupts data by changing line endings #63203