python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

email module creates base64 output with incorrect line breaks #74218

Closed 2c8ee9fa-7f11-45f7-b140-44790542eb15 closed 7 years ago

2c8ee9fa-7f11-45f7-b140-44790542eb15 commented 7 years ago
BPO 30032
Nosy @warsaw, @jribbens, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.7', 'type-bug', 'library', 'expert-email'] title = 'email module creates base64 output with incorrect line breaks' updated_at = user = 'https://github.com/jribbens' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = ['Library (Lib)', 'email'] creation = creator = 'jribbens' dependencies = [] files = [] hgrepos = [] issue_num = 30032 keywords = [] message_count = 8.0 messages = ['291434', '291442', '291443', '291446', '291448', '291450', '291451', '291452'] nosy_count = 3.0 nosy_names = ['barry', 'jribbens', 'r.david.murray'] pr_nums = [] priority = 'normal' resolution = 'out of date' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue30032' versions = ['Python 3.6', 'Python 3.7'] ```

2c8ee9fa-7f11-45f7-b140-44790542eb15 commented 7 years ago

The email module, when creating base64-encoded text parts, does not process line breaks correctly - RFC 2045 s6.8 says that line breaks must be converted to CRLF before base64-encoding, and the email module is not doing this.

>>> from email.mime.text import MIMEText
>>> import base64
>>> m = MIMEText("hello\nthere", _charset="utf-8")
>>> m.as_string()
'Content-Type: text/plain; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: base64\n\naGVsbG8KdGhlcmU=\n'
>>> base64.b64decode("aGVsbG8KdGhlcmU=")
b'hello\nthere'

You might say that it is the application's job to convert the line endings before calling MIMEText(), but I think all application authors would be surprised by this. Certainly the MailMan authors would be, as they say this is a Python bug not a MailMan bug ;-)

bitdancer commented 7 years ago

This appears to be a problem in the new API as well. I don't think we can change the legacy API because its been that way forever and applications might be depending on it (that is, the library preserves exactly what it is handed, and an application might break if that changes). In the new API, though, I think we could get away with fixing it to do the transformation on text strings in the default content manager so that the line endings follow the message policy. (That is, if you use default, you get \n, if you use SMTP, you get \r\n). I think we can get away with it because there aren't that many applications using the new API yet.

bitdancer commented 7 years ago

Actually, I think the fix would go in the generator, not in the contentmanager, but it's been long enough since I've worked on the code that I'm not sure.

2c8ee9fa-7f11-45f7-b140-44790542eb15 commented 7 years ago

OK cool, but please note that this is a MIME issue not an SMTP issue - if the message has text that is being base64-encoded then it must use CRLF line breaks regardless of whether SMTP is involved or not.

bitdancer commented 7 years ago

That is true for text/xxxx types, yes. The policy is named after the target wire protocol, and if you are transmitting an email message over SMTP, that implies MIME. What to do if you are not sending it over SMTP, though, is a tougher question. One could argue it either way for the 'default' policy, and I'm open to argument.

2c8ee9fa-7f11-45f7-b140-44790542eb15 commented 7 years ago

So on further investigation, with the new API and policy=SMTP, it does generate correct base64 output. So I guess on the basis that the new version can generate the right output, and it appears to be a deliberate choice that the default policy breaks the RFCs, you can close this issue ;-)

>>> from email.message import EmailMessage
>>> from email.policy import SMTP
>>> import base64
>>> msg = EmailMessage(policy=SMTP)
>>> msg.set_content("hello\nthere", cte="base64")
>>> msg.as_string()
'Content-Type: text/plain; charset="utf-8"\r\nContent-Transfer-Encoding: base64\r\nMIME-Version: 1.0\r\n\r\naGVsbG8NCnRoZXJlDQo=\r\n'
>>> base64.b64decode("aGVsbG8NCnRoZXJlDQo=")
b'hello\r\nthere\r\n'
bitdancer commented 7 years ago

Huh. I ran something like that test and thought I saw the reverse. I guess I misread my terminal. Looking at the code, set_content does take care to fix the line ending according to the policy before doing the encoding.

bitdancer commented 7 years ago

There is, however, an issue that if you pass a message with the default policy to the generator and specify SMTP as the policy, it doesn't *recode* the line endings. I thought there was an open issue for that, but I can't find it.

One solution would be to do as you suggest and make \r\n what we always use when doing base64 encoding. I'm open to that as a possible fix, but it probably needs at least a brief discussion with Barry.