python / cpython

The Python programming language
https://www.python.org
Other
63.85k stars 30.56k forks source link

email.header.decode_header makes mistakes #52379

Closed a7220865-50c5-40ed-8425-d0c4c0e25af6 closed 14 years ago

a7220865-50c5-40ed-8425-d0c4c0e25af6 commented 14 years ago
BPO 8132
Nosy @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['invalid', 'type-bug', 'library'] title = 'email.header.decode_header makes mistakes' updated_at = user = 'https://bugs.python.org/grmtz' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = ['Library (Lib)'] creation = creator = 'grmtz' dependencies = [] files = [] hgrepos = [] issue_num = 8132 keywords = [] message_count = 2.0 messages = ['101004', '101039'] nosy_count = 2.0 nosy_names = ['r.david.murray', 'grmtz'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue8132' versions = ['Python 2.6'] ```

a7220865-50c5-40ed-8425-d0c4c0e25af6 commented 14 years ago

Examples:

s = '=?UTF-8?B?QWNjdXPDqSBkZSByw6ljZXB0aW9uIChhZmZpY2jDqSkgLSA=?=Arobase !' decode_header(s) ---> [('=?UTF-8?B?QWNjdXPDqSBkZSByw6ljZXB0aW9uIChhZmZpY2jDqSkgLSA=?=Arobase !', None)] which seems bad... but ss ='=?UTF-8?B?QWNjdXPDqSBkZSByw6ljZXB0aW9uIChhZmZpY2jDqSkgLSA=?= Arobase !' decode_header(ss) ---> [('Accus\xc3\xa9 de r\xc3\xa9ception (affich\xc3\xa9) - ', 'utf-8'), ('Arobase !', None)] which seems good...

bitdancer commented 14 years ago

Per the RFC, this is the correct behavior. An encoded word *must* begin and end either at the field boundary or with whitespace. So ...?=Arobase, with no whitespace between the = and Arobase, makes your first example into an invalid encoded word, and thus it is returned as if it were plain ASCII.

One could argue that email could be smarter and interpret this string as an encoded word anyway, following the Postel principle (be generous in what you accept), but it currently does not do so, and not doing so is not a bug.

email6 will handle such non-RFC compliant examples better, if all goes well.