python / cpython

The Python programming language
https://www.python.org
Other
62.47k stars 29.99k forks source link

quoted printable parse the sequence '= ' incorrectly #44182

Closed d709d55e-d6dd-43bb-88fd-f0f5fd492150 closed 17 years ago

d709d55e-d6dd-43bb-88fd-f0f5fd492150 commented 17 years ago
BPO 1588217
Nosy @birkenfeld

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/birkenfeld' closed_at = created_at = labels = ['library'] title = "quoted printable parse the sequence '= ' incorrectly" updated_at = user = 'https://bugs.python.org/tungwaiyip' ``` bugs.python.org fields: ```python activity = actor = 'georg.brandl' assignee = 'georg.brandl' closed = True closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'tungwaiyip' dependencies = [] files = [] hgrepos = [] issue_num = 1588217 keywords = [] message_count = 3.0 messages = ['30416', '30417', '30418'] nosy_count = 2.0 nosy_names = ['georg.brandl', 'tungwaiyip'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue1588217' versions = ['Python 2.4'] ```

d709d55e-d6dd-43bb-88fd-f0f5fd492150 commented 17 years ago

>> import quopri

>> s = 'I say= a secret message\r\nThank you'

>>> quopri.a2b_qp
<built-in function a2b_qp>
>>> quopri.decodestring(s)  # use the c version 
binascii.a2b_qp() to decode
'I sayThank you'

>>> quopri.a2b_qp=None
>>> quopri.decodestring(s)  # use the python version 
quopri.decode() to decode
'I say= a secret message\nThank you'

Note that the sequence '= ' is invalid according to RFC 2045 section 6.7:

------------------------------------------------------- An "=" followed by a character that is neither a hexadecimal digit (including "abcdef") nor the CR character of a CRLF pair is illegal ... A reasonable approach by a robust implementation might be to include the "=" character and the following character in the decoded data without any transformation -------------------------------------------------------

The lenient interpretation is used by the Python version parser quopri.decode() to produce the second string. Most email clients use a similar lenient interpretation.

The C version parser binascii.a2b_qp(), which is used in preference to the Python verison, produce a surprising result with the string 'a secret message' omitted.

This may create an opportunity for spammers to insert secret message after '= ' so that it is not visible to Python based spam filter but woiuld display in non- Python based email client.

d709d55e-d6dd-43bb-88fd-f0f5fd492150 commented 17 years ago

Logged In: YES user_id=561546

The problem may come from binascii_a2b_qp() in binascii.c. It considers the '= ' or '=\t' sequence as a soft line break. Such interpretation appears to have no basis. It could be an misinterpretation of RFC 2045:

------------------------------------------------------------------- In particular, an "=" at the end of an encoded line, indicating a soft line break (see rule #5) may follow one or more TAB (HT) or SPACE characters. -------------------------------------------------------------------

This passage reminds readers they might find TAB or SPACE before an "=", but not after it. "= " is plain illegal as far as I know.

birkenfeld commented 17 years ago

Thanks for the report, this is now fixed in rev. 52765, 52766 (2.5).