5112f438-a4c5-4f62-8b45-7c4327cd8329 commented 14 years ago

BPO	9298
Nosy	@warsaw, @bitdancer
Files	bugs.python.org_issue9298.pdf: PDF of this report page issue9298-2.patch: email/encoders.py and email/test/test_email.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/bitdancer' closed_at = created_at = labels = ['type-bug', 'library'] title = 'binary email attachment issue with base64 encoding' updated_at = user = 'https://bugs.python.org/vunruh' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'r.david.murray' closed = True closed_date = closer = 'r.david.murray' components = ['Library (Lib)'] creation = creator = 'vunruh' dependencies = [] files = ['18064', '20755'] hgrepos = [] issue_num = 9298 keywords = ['patch'] message_count = 17.0 messages = ['110710', '110743', '110762', '128220', '128254', '128255', '128256', '128257', '128363', '128364', '128404', '128471', '128481', '128515', '130655', '131167', '131170'] nosy_count = 5.0 nosy_names = ['barry', 'r.david.murray', 'vunruh', 'yves@zioup.com', 'python-dev'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue9298' versions = ['Python 3.1', 'Python 3.2', 'Python 3.3'] ```

5112f438-a4c5-4f62-8b45-7c4327cd8329 commented 14 years ago

I'm using Python to email a text version and a PDF version of a report. The standard way of doing things does not work with Vista's Mail program, but works fine with Mail on OSX. So, I don't know if this is a Python or a Vista Mail bug. By standard way, I mean:

    # Add the attached PDF:
    part = MIMEApplication(pdf,"pdf")
    part.add_header('Content-Disposition', 'attachment', filename=pdfFile)
    msg.attach(part)

To fix the problem, I changed C:\Python31\Lib\email\encoders.py to use encodebytes instead of b64encode in order to get mail on Windows Vista to correctly interpret the attachment. This splits the base64 encoding into many lines of some fixed lenth.

I can achieve the same thing adding the attachment by hand with the following code:

    from email.mime.base import MIMEBase
    part = MIMEBase("application","pdf")
    part.add_header('Content-Transfer-Encoding', 'base64') 
    part.set_payload(str(base64.encodebytes(pdf),'ascii'))
    msg.attach(part)

Seems like I shouldn't need to know this much.

I'm new to Python and this is the first bug I have submitted, so if you need additional information, please let me know.

bitdancer commented 14 years ago

Can you try this against 3.1 from svn (or py3k from svn)? A bug was fixed that might be relevant. Alternatively a unit test that demonstrates the problem would be most helpful.

5112f438-a4c5-4f62-8b45-7c4327cd8329 commented 14 years ago

Here's code that attaches the pdf file the two different ways. Both attachments are OK when I read the mail on OSX, but one is corrupt when read with Windows Mail on Vista. I wasn't sure what to do with the actual sending of the mail to the server. You'll have to change the code to use your account or something.

def emailReport(pdf):
    """Email the report as multi-part MIME"""

    from email.mime.multipart import MIMEMultipart  
    msg = MIMEMultipart()
    msg['Subject'] = 'Corrupt PDF'
    msg['From'] = 'Me <me@myProvider.net>'
    msg['To'] = 'You <you@yourProvider.com>'

    # Add the PDF the easy way that fails:
    from email.mime.application import MIMEApplication
    fp = open(pdf, 'rb')
    part = MIMEApplication(fp.read(),"pdf")
    fp.close()
    part.add_header('Content-Disposition', 'attachment',filename='This one fails.pdf')
    msg.attach(part)

    # Add the PDF the hard way using the legacy base64 encoder
    from email.mime.base import MIMEBase
    part = MIMEBase("application","pdf")
    part.add_header('Content-Transfer-Encoding', 'base64')
    part.add_header('Content-Disposition', 'attachment',filename='This one works.pdf')
    import base64
    fp = open(pdf, 'rb')
    part.set_payload(str(base64.encodebytes(fp.read()),'ascii'))
    fp.close()
    msg.attach(part)

    # Send the email
    from smtplib import SMTP
    server = SMTP('smtpauth.provider.net')
    server.login(user,password)
    server.sendmail(msg['From'], recipient, msg.as_string())
    server.quit()

emailReport('bugs.python.org_issue9298.pdf')

bitdancer commented 13 years ago

!/usr/bin/python3.1

import unittest
import email.mime.image

class emailEncoderTestCase(unittest.TestCase):
  def setUp(self):

# point to an image
binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png'

    while len(binaryfile) == 0:
      print('Enter the name of an image: ')
      binaryfile = raw_input()

    fp = open('/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png', 'rb')
    self.bindata = fp.read()

  def test_convert(self):

    mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png')

    base64ed = mimed.get_payload()
    # print (base64ed)

    # work out if any line within the string is > 76 chars
    chopped = base64ed.split('\n')
    lengths = [ len(x) for x in chopped ]
    toolong = [ x for x in lengths if x > 76 ]

    msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.'
    self.assertEqual(0, len(toolong), msg)

if __name__ == '__main__':
  unittest.main()

c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

Here's a better version (sorry I don't know how to remove msg128255:

c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

!/usr/bin/python3.1

import unittest
import email.mime.image

class emailEncoderTestCase(unittest.TestCase):
  def setUp(self):

# point to an image
binaryfile = ''
#binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png'

    while len(binaryfile) == 0:
      print('Enter the name of an image: ')
      binaryfile = input()

    fp = open(binaryfile, 'rb')
    self.bindata = fp.read()

  def test_convert(self):

    mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png')

    base64ed = mimed.get_payload()
    # print (base64ed)

    # work out if any line within the string is > 76 chars
    chopped = base64ed.split('\n')
    lengths = [ len(x) for x in chopped ]
    toolong = [ x for x in lengths if x > 76 ]

    msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.'
    self.assertEqual(0, len(toolong), msg)

if __name__ == '__main__':
  unittest.main()

c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

Test if email.encoders.encode_base64 returns a single line string, or a string broken up in 76 chars line, as per RFC.

c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

Replaces b64encode by encodebytes.

bitdancer commented 13 years ago

Yves: thanks for the patches. If you feel like redoing the test one as a patch against Lib/email/test/test_email.py, that would be great. I'd suggest having the test just split the lines and do

assertLessEqual(max([len(x) for x in lines]), 76)

or something along those lines.

As for the fix, encoders used to use encodebytes, but it tacks on a trailing newline that, according to the old code (see r57800) is unwanted. So perhaps we need to revert that part of r57800 instead.

c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

I will. Please don't use my patch yet, it breaks something else in the test_email:

./python Lib/test/regrtest.py test_email
[1/1] test_email
test test_email failed -- Traceback (most recent call last):
  File "/export/incoming/python/py3k/Lib/email/test/test_email.py", line 1146, in test_body
    eq(msg.get_payload(), '+vv8/f7/')
AssertionError: '+vv8/f7/\n' != '+vv8/f7/'
- +vv8/f7/
?         -
+ +vv8/f7/

1 test failed: test_email

This is with my code patch, not the test patch. I'll look at it, and post again, could be the extra newline you were talking about.

c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

I've got two issues with this code (Lib/email/test/test_email.py):

1128 def test_body(self): 1129 eq = self.assertEqual 1130 bytes = b'\xfa\xfb\xfc\xfd\xfe\xff' 1131 msg = MIMEApplication(bytes) 1132 eq(msg.get_payload(), '+vv8/f7/') 1133 eq(msg.get_payload(decode=True), bytes)

1) Even though it works, I find the use of a defined type as the name of a variable confusing (line 1130, bytes).

2) The test on line 1132 fails if the base64 payload has an extra newline at the end, but newlines are not an issue in base64 and are actually expected. In fact the test at line 1133 shows that once decoded, the bytes are reverted to their original form.

Is there a way to find who is the author of this test, and what was the intent? Would the following test be acceptable (still testing a valid base64 encoding):

eq(msg.get_payload().strip(), '+vv8/f7/')

Thanks.

c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

encoders.py: Fixes the issue of base64'ed being > 76 chars

test_email.py:

-test that base64'ed binary is split into 76 chars lines

-WARRNING: Changes the test for MIMEApplication.test_body: -it changes the name of the variable 'bytes' to 'bytesdata' -it strip()s the base64ed payload before it compares it to the expected payload. With the change above (using encodebytes instead of b64encode in encoders.py), this test, as is, fails, because there is an extra newline at the end. Extra newlines are part of base64 and should not be an issue, as a matter of fact, you obtain the original bytes when you decode, regardless of having extra newlines. It would be good to know the intent of the original author of this test. Was the intent to ensure there were no newline? If so, why? Or was the intent to simply confirm the base64 encoding conform to the standard? If the latter, my change should not be an issue.

All test ("make test") passed with this patch.

bitdancer commented 13 years ago

Unfortunately we don't have enough history information to determine who wrote the original _bencode function, although very likely it was Barry. As for the test, that seems to have been written during the python3 translation to make sure that the behavior implemented by _bencode was preserved. Python2 has no such test: if you remove the newline check from _bencode, the test suite passes.

Checking with RFC 2045, we find this:

The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.

"All line breaks..." seems pretty unambiguous: an extra trailing newline should be ignored by any compliant email agent. That does not eliminate the possibility that a non-compliant email agent would tack on an extra newline if there is one after the base64 encoded text, but it seems very very unlikely. I am therefore inclined to fix the test, as you suggest.

I hope that Barry can remember why _bencode was introduced in the first place, since clearly there was *some* reason.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 13 years ago

New changeset 062d09d7bf94 by R David Murray in branch '3.1': bpo-9298: restore proper folding of base64 encoded bodies. http://hg.python.org/cpython/rev/062d09d7bf94

New changeset c34320d9095e by R David Murray in branch '3.2': Merge bpo-9298 fix. http://hg.python.org/cpython/rev/c34320d9095e

New changeset de2cd04e5101 by R David Murray in branch 'default': Merge bpo-9298 fix. http://hg.python.org/cpython/rev/de2cd04e5101

bitdancer commented 13 years ago

I talked with Barry, who could find no relevant discussions in his email logs. We decided that _bencode was misguided in the first place (this is reinforced by the bug I fixed a year ago where email was stripping the final newline off a body *after* decoding it). So I've committed Yves' fix. Thanks, Yves.

python / cpython

binary email attachment issue with base64 encoding #53544

!/usr/bin/python3.1

!/usr/bin/python3.1