python / cpython

The Python programming language
https://www.python.org
Other
62.47k stars 29.99k forks source link

decode_header() fails on multiline headers #46910

Closed d36ec4cc-df84-4d4f-9efc-7b1fe70aca9e closed 12 years ago

d36ec4cc-df84-4d4f-9efc-7b1fe70aca9e commented 16 years ago
BPO 2658
Nosy @warsaw, @bitdancer
Superseder
  • bpo-1079: decode_header does not follow RFC 2047
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'library', 'expert-email'] title = 'decode_header() fails on multiline headers' updated_at = user = 'https://bugs.python.org/cschnee' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = ['Library (Lib)', 'email'] creation = creator = 'cschnee' dependencies = [] files = [] hgrepos = [] issue_num = 2658 keywords = [] message_count = 3.0 messages = ['65630', '162220', '162222'] nosy_count = 4.0 nosy_names = ['barry', 'cschnee', 'r.david.murray', 'python-dev'] pr_nums = [] priority = 'normal' resolution = 'duplicate' stage = 'resolved' status = 'closed' superseder = '1079' type = 'behavior' url = 'https://bugs.python.org/issue2658' versions = ['Python 3.3'] ```

    d36ec4cc-df84-4d4f-9efc-7b1fe70aca9e commented 16 years ago

    email.Header.decode_header() does not correctly deal with multiline Headerlines. header.py in revision 54371 (1) changes the behaviour, whereas previously multiline headers where parsed correctly, header.py 54371 introduced a new regex part, that renders such headers invalid and they won't be parsed as expected. Given the following header line (doesn't matter if its parsed from a mail or read from a string) which represents IMHO a valid RFC2047 header line:

    from email.Header import decode_header
    decode_header('=?windows-1252?Q?=22M=FCller_T=22?=\r\n <T.Mueller@xxx.com>')

    this will result in: header.py (54371): [('=?windows-1252?Q?=22M=FCller_T=22?=\r\n \T.Mueller@xxx.com\', None)]

    resp. with header.py (54370): [('"M\xfcller T"', 'windows-1252'), (' \T.Mueller@xxx.com\', None)]

    Actually both seem parsed wrong, but with 54370 the result looks more
    sane (the space should be IMO removed). 
    Once the CRLF sequence is removed from the header it works fine and all
    looks as expected:
    >>> decode_header('=?windows-1252?Q?=22M=FCller_T=22?= <T.Mueller@xxx.com>')
    [('"M\xfcller T"', 'windows-1252'), ('<T.Mueller@xxx.com>', None)]

    This problem might or might not be related to

    (1) http://svn.python.org/view?rev=54371&view=rev

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 12 years ago

    New changeset 0808cb8c60fd by R David Murray in branch 'default': bpo-2658: Add test for issue fixed by fix for bpo-1079. http://hg.python.org/cpython/rev/0808cb8c60fd

    bitdancer commented 12 years ago

    This is fixed by the fix for bpo-1079. I've added the test to the test suite.