Open serhiy-storchaka opened 11 years ago
>>> import quopri, email.quoprimime
>>> quopri.decodestring(b'==41')
b'=41'
>>> email.quoprimime.decode('==41')
'=A'
I don't see a rule about double '=' in RFC 1521-1522 or RFCs 2045-2047 and I think quopri is wrong.
Other half of this bug (encoding '=' as '==') was fixed in 9bc52706d283.
There are other inconsistencies. email.quoprimime.decode(), binascii.a2b_qp() and pure Python (by default binascii used) quopri.decodestring() returns different results for following data:
quoprimime binascii quopri
b'=' '' b'' b'=' b'==' '=' b'=' b'==' b'= ' '' b'= ' b'= ' b'= \n' '' b'= \n' b'' b'=\r' '' b'' b'=\r' b'==41' '=A' b'=41' b'=A'
Most of the variations represent different invalid-input recovery choices. I believe binascii's decoding of b'= \n' is incorrect, as is its decoding of b'==41'. quopri's decoding of b'=\r' is arguably incorrect as well, given that python generally supports universal line ends. Otherwise the decodings are all responses to erroneous input for which the behavior is not specified.
That said, we ought to pick one error recovery scheme and implement it in all places, and IMO it shouldn't be exactly any of the ones we've got. Or better yet, use one common implementation. Untangling quopri is on my (too large) List of Things To Do :)
Perl's MIME::QuotedPrint produces same result as pure Python quopri. konwert qp-8bit produces same result as binascii (except '==41' it decodes as '=A').
RFC 2045 says:
"""A reasonable approach by a robust implementation might be to include the "=" character and the following character in the decoded data without any transformation and, if possible, indicate to the user that proper decoding was not possible at this point in the data. """
So what scheme we will picked?
As I said, not exactly any of the above.
I'll get back to this after I finish the new email code (which should happen before the end of the month). I need to take some time to look over the RFCs and real world examples and come up with the most appropriate rules.
Ping.
The double equals "==" case for the “quopri” implementation in Python is now consistent with the others thanks to the fix in bpo-23681 (see also bpo-21511).
According to bpo-20121, the quopri (Python) implementation only supports LF (\n) characters as line breaks, and the binascii (C) implementation also supports CRLF. So I agree that the whitespace-before-newline case "= \n" is a genuine bug (see bpo-16473). But the CR case "=\r" is not supported because neither quopri nor binascii support universal newlines or CR line breaks on their own.
Thus currently the table of discrepancies looks as:
quoprimime binascii quopri
b'=' '' b'' b'=' b'= ' '' b'= ' b'= ' b'= \n' '' b'= \n' b'' b'=\r' '' b'' b'=\r' b'==41' '=A' b'=41' b'=41'
OK, I've finally gotten around to looking at this. It looks like quopri and binascii are not stripping trailing whitespace.
quoprimime binascii quopri preferred
b'=' '' b'' b'=' '=' b'= ' '' b'= ' b'= ' '=' b'= \n' '' b'= \n' b'' quoprimime binascii quopri
b'=' '' b'' b'=' b'= ' '' b'= ' b'= ' b'= \n' '' b'= \n' b'' b'=\r' '' b'' b'=\r' b'==41' '=A' b'=41' b'=41' '=\n' b'=\r' '' b'' b'=\r' '=\r' b'==41' '=A' b'=41' b'=41' '=A' b'= \n f\n' ' f\n' b'= \n f\n' b'= \n f\n' ' f\n'
The RFC recommends that a trailing = be preserved, but that trailing whitespace be ignored. It doesn't speak directly to the ==41 case, but one can infer that the first = in the == pair is most likely to have "come from the source text" and not been encoded, while the =41 was an intentional encoding and so should be decoded.
Now, that said, the actual behavior that our libraries have had for a long time is to treat the "last line" just like all other lines, and strip a trailing =. So I would be inclined to keep that behavior for backward compatibility reasons rather than change it to be more RFC compliant, given that we don't have any actual bug report related to it, and "fixing" it could break things. Given that, the current quoprimime behavior becomes the reference.
However, backward compatibility concerns also arise around starting to strip trailing space in quopri and binascii. Maybe we only make that change in 3.8?
Many thanks David! But sorry, your table confused me. I can't read it. Could you please reformat it?
I should have just deleted the table, actually.
The only important info in it is that per RFC '=', '=\n', and '= \n' all ought to become '='. But I don't think we should make that change, I think we should continue to turn those into ''. So I consider the (current!) bwehavior of quoprimime to be the correct behavior.
I also gave the example of '= \n foo\n', to show that quopri and binascii aren't stripping trailing blanks, as Martin noted in the other issue. They fold lines if they see '=\n', but not if they see '= \n', which is wrong per the (email!) RFC. I'm not clear if it is wrong for non-email uses of quopric, I haven't tried to research that.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.7', 'type-bug', 'library', 'expert-email']
title = 'Inconsistency between quopri.decodestring() and email.quoprimime.decode()'
updated_at =
user = 'https://github.com/serhiy-storchaka'
```
bugs.python.org fields:
```python
activity =
actor = 'r.david.murray'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'email']
creation =
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 18022
keywords = []
message_count = 12.0
messages = ['189663', '190874', '190876', '190889', '197665', '197715', '290521', '290525', '290533', '316563', '316622', '316642']
nosy_count = 6.0
nosy_names = ['gvanrossum', 'barry', 'r.david.murray', 'Jeremy.Hylton', 'martin.panter', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue18022'
versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']
```