python / cpython

The Python programming language
https://www.python.org
Other
62.47k stars 29.99k forks source link

binary email attachment issue with base64 encoding #53544

Closed 5112f438-a4c5-4f62-8b45-7c4327cd8329 closed 13 years ago

5112f438-a4c5-4f62-8b45-7c4327cd8329 commented 14 years ago
BPO 9298
Nosy @warsaw, @bitdancer
Files
  • bugs.python.org_issue9298.pdf: PDF of this report page
  • issue9298-2.patch: email/encoders.py and email/test/test_email.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/bitdancer' closed_at = created_at = labels = ['type-bug', 'library'] title = 'binary email attachment issue with base64 encoding' updated_at = user = 'https://bugs.python.org/vunruh' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'r.david.murray' closed = True closed_date = closer = 'r.david.murray' components = ['Library (Lib)'] creation = creator = 'vunruh' dependencies = [] files = ['18064', '20755'] hgrepos = [] issue_num = 9298 keywords = ['patch'] message_count = 17.0 messages = ['110710', '110743', '110762', '128220', '128254', '128255', '128256', '128257', '128363', '128364', '128404', '128471', '128481', '128515', '130655', '131167', '131170'] nosy_count = 5.0 nosy_names = ['barry', 'r.david.murray', 'vunruh', 'yves@zioup.com', 'python-dev'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue9298' versions = ['Python 3.1', 'Python 3.2', 'Python 3.3'] ```

    5112f438-a4c5-4f62-8b45-7c4327cd8329 commented 14 years ago

    I'm using Python to email a text version and a PDF version of a report. The standard way of doing things does not work with Vista's Mail program, but works fine with Mail on OSX. So, I don't know if this is a Python or a Vista Mail bug. By standard way, I mean:

        # Add the attached PDF:
        part = MIMEApplication(pdf,"pdf")
        part.add_header('Content-Disposition', 'attachment', filename=pdfFile)
        msg.attach(part)

    To fix the problem, I changed C:\Python31\Lib\email\encoders.py to use encodebytes instead of b64encode in order to get mail on Windows Vista to correctly interpret the attachment. This splits the base64 encoding into many lines of some fixed lenth.

    I can achieve the same thing adding the attachment by hand with the following code:

        from email.mime.base import MIMEBase
        part = MIMEBase("application","pdf")
        part.add_header('Content-Transfer-Encoding', 'base64') 
        part.set_payload(str(base64.encodebytes(pdf),'ascii'))
        msg.attach(part)

    Seems like I shouldn't need to know this much.

    I'm new to Python and this is the first bug I have submitted, so if you need additional information, please let me know.

    bitdancer commented 14 years ago

    Can you try this against 3.1 from svn (or py3k from svn)? A bug was fixed that might be relevant. Alternatively a unit test that demonstrates the problem would be most helpful.

    5112f438-a4c5-4f62-8b45-7c4327cd8329 commented 14 years ago

    Here's code that attaches the pdf file the two different ways. Both attachments are OK when I read the mail on OSX, but one is corrupt when read with Windows Mail on Vista. I wasn't sure what to do with the actual sending of the mail to the server. You'll have to change the code to use your account or something.

    def emailReport(pdf):
        """Email the report as multi-part MIME"""
    
        from email.mime.multipart import MIMEMultipart  
        msg = MIMEMultipart()
        msg['Subject'] = 'Corrupt PDF'
        msg['From'] = 'Me <me@myProvider.net>'
        msg['To'] = 'You <you@yourProvider.com>'
    
        # Add the PDF the easy way that fails:
        from email.mime.application import MIMEApplication
        fp = open(pdf, 'rb')
        part = MIMEApplication(fp.read(),"pdf")
        fp.close()
        part.add_header('Content-Disposition', 'attachment',filename='This one fails.pdf')
        msg.attach(part)
    
        # Add the PDF the hard way using the legacy base64 encoder
        from email.mime.base import MIMEBase
        part = MIMEBase("application","pdf")
        part.add_header('Content-Transfer-Encoding', 'base64')
        part.add_header('Content-Disposition', 'attachment',filename='This one works.pdf')
        import base64
        fp = open(pdf, 'rb')
        part.set_payload(str(base64.encodebytes(fp.read()),'ascii'))
        fp.close()
        msg.attach(part)
    
        # Send the email
        from smtplib import SMTP
        server = SMTP('smtpauth.provider.net')
        server.login(user,password)
        server.sendmail(msg['From'], recipient, msg.as_string())
        server.quit()
    
    emailReport('bugs.python.org_issue9298.pdf')
    bitdancer commented 13 years ago

    See also duplicate bpo-11156.

    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    Solution: In /usr/lib/python3.1/email/encoders.py, use encodebytes instead of b64encode:

    --- encoders.py 2011-02-08 09:37:21.025030051 -0700
    +++ encoders.py.yves    2011-02-08 09:38:04.945608365 -0700
    @@ -12,7 +12,7 @@
         ]
    
    -from base64 import b64encode as _bencode
    +from base64 import encodebytes as _bencode
     from quopri import encodestring as _encode
    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    !/usr/bin/python3.1

    import unittest
    import email.mime.image
    
    class emailEncoderTestCase(unittest.TestCase):
      def setUp(self):
    # point to an image
    binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png'
        while len(binaryfile) == 0:
          print('Enter the name of an image: ')
          binaryfile = raw_input()
    
        fp = open('/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png', 'rb')
        self.bindata = fp.read()
    
      def test_convert(self):
    
        mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png')
    
        base64ed = mimed.get_payload()
        # print (base64ed)
    
        # work out if any line within the string is > 76 chars
        chopped = base64ed.split('\n')
        lengths = [ len(x) for x in chopped ]
        toolong = [ x for x in lengths if x > 76 ]
    
        msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.'
        self.assertEqual(0, len(toolong), msg)
    
    if __name__ == '__main__':
      unittest.main()
    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    Here's a better version (sorry I don't know how to remove msg128255:

    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    !/usr/bin/python3.1

    import unittest
    import email.mime.image
    
    class emailEncoderTestCase(unittest.TestCase):
      def setUp(self):
    # point to an image
    binaryfile = ''
    #binaryfile = '/usr/share/openclipart/png/animals/mammals/happy_monkey_benji_park_01.png'
        while len(binaryfile) == 0:
          print('Enter the name of an image: ')
          binaryfile = input()
    
        fp = open(binaryfile, 'rb')
        self.bindata = fp.read()
    
      def test_convert(self):
    
        mimed = email.mime.image.MIMEImage(self.bindata, _subtype='png')
    
        base64ed = mimed.get_payload()
        # print (base64ed)
    
        # work out if any line within the string is > 76 chars
        chopped = base64ed.split('\n')
        lengths = [ len(x) for x in chopped ]
        toolong = [ x for x in lengths if x > 76 ]
    
        msg = 'There is at least one line of ' + str(max(lengths)) + ' chars.'
        self.assertEqual(0, len(toolong), msg)
    
    if __name__ == '__main__':
      unittest.main()
    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    Test if email.encoders.encode_base64 returns a single line string, or a string broken up in 76 chars line, as per RFC.

    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    Replaces b64encode by encodebytes.

    bitdancer commented 13 years ago

    Yves: thanks for the patches. If you feel like redoing the test one as a patch against Lib/email/test/test_email.py, that would be great. I'd suggest having the test just split the lines and do

    assertLessEqual(max([len(x) for x in lines]), 76)

    or something along those lines.

    As for the fix, encoders used to use encodebytes, but it tacks on a trailing newline that, according to the old code (see r57800) is unwanted. So perhaps we need to revert that part of r57800 instead.

    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    I will. Please don't use my patch yet, it breaks something else in the test_email:

    ./python Lib/test/regrtest.py test_email
    [1/1] test_email
    test test_email failed -- Traceback (most recent call last):
      File "/export/incoming/python/py3k/Lib/email/test/test_email.py", line 1146, in test_body
        eq(msg.get_payload(), '+vv8/f7/')
    AssertionError: '+vv8/f7/\n' != '+vv8/f7/'
    - +vv8/f7/
    ?         -
    + +vv8/f7/

    1 test failed: test_email

    This is with my code patch, not the test patch. I'll look at it, and post again, could be the extra newline you were talking about.

    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    I've got two issues with this code (Lib/email/test/test_email.py):

    1128 def test_body(self): 1129 eq = self.assertEqual 1130 bytes = b'\xfa\xfb\xfc\xfd\xfe\xff' 1131 msg = MIMEApplication(bytes) 1132 eq(msg.get_payload(), '+vv8/f7/') 1133 eq(msg.get_payload(decode=True), bytes)

    1) Even though it works, I find the use of a defined type as the name of a variable confusing (line 1130, bytes).

    2) The test on line 1132 fails if the base64 payload has an extra newline at the end, but newlines are not an issue in base64 and are actually expected. In fact the test at line 1133 shows that once decoded, the bytes are reverted to their original form.

    Is there a way to find who is the author of this test, and what was the intent? Would the following test be acceptable (still testing a valid base64 encoding):

    eq(msg.get_payload().strip(), '+vv8/f7/')

    Thanks.

    c7d07a43-d799-45c8-bc8c-bbcc9898e89c commented 13 years ago

    encoders.py: Fixes the issue of base64'ed being > 76 chars

    test_email.py:

    -test that base64'ed binary is split into 76 chars lines

    -WARRNING: Changes the test for MIMEApplication.test_body: -it changes the name of the variable 'bytes' to 'bytesdata' -it strip()s the base64ed payload before it compares it to the expected payload. With the change above (using encodebytes instead of b64encode in encoders.py), this test, as is, fails, because there is an extra newline at the end. Extra newlines are part of base64 and should not be an issue, as a matter of fact, you obtain the original bytes when you decode, regardless of having extra newlines. It would be good to know the intent of the original author of this test. Was the intent to ensure there were no newline? If so, why? Or was the intent to simply confirm the base64 encoding conform to the standard? If the latter, my change should not be an issue.

    All test ("make test") passed with this patch.

    bitdancer commented 13 years ago

    Unfortunately we don't have enough history information to determine who wrote the original _bencode function, although very likely it was Barry. As for the test, that seems to have been written during the python3 translation to make sure that the behavior implemented by _bencode was preserved. Python2 has no such test: if you remove the newline check from _bencode, the test suite passes.

    Checking with RFC 2045, we find this:

    The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.

    "All line breaks..." seems pretty unambiguous: an extra trailing newline should be ignored by any compliant email agent. That does not eliminate the possibility that a non-compliant email agent would tack on an extra newline if there is one after the base64 encoded text, but it seems very very unlikely. I am therefore inclined to fix the test, as you suggest.

    I hope that Barry can remember why _bencode was introduced in the first place, since clearly there was *some* reason.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 13 years ago

    New changeset 062d09d7bf94 by R David Murray in branch '3.1': bpo-9298: restore proper folding of base64 encoded bodies. http://hg.python.org/cpython/rev/062d09d7bf94

    New changeset c34320d9095e by R David Murray in branch '3.2': Merge bpo-9298 fix. http://hg.python.org/cpython/rev/c34320d9095e

    New changeset de2cd04e5101 by R David Murray in branch 'default': Merge bpo-9298 fix. http://hg.python.org/cpython/rev/de2cd04e5101

    bitdancer commented 13 years ago

    I talked with Barry, who could find no relevant discussions in his email logs. We decided that _bencode was misguided in the first place (this is reinforced by the bug I fixed a year ago where email was stripping the final newline off a body *after* decoding it). So I've committed Yves' fix. Thanks, Yves.