python / cpython

The Python programming language
https://www.python.org
Other
62.4k stars 29.96k forks source link

EmailMessage should support Header objects #65294

Open 7f693efa-c293-4e08-80e6-4c8268e300a5 opened 10 years ago

7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago
BPO 21095
Nosy @warsaw, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['expert-email'] title = 'EmailMessage should support Header objects' updated_at = user = 'https://bugs.python.org/brandon-rhodes' ``` bugs.python.org fields: ```python activity = actor = 'BreamoreBoy' assignee = 'none' closed = False closed_date = None closer = None components = ['email'] creation = creator = 'brandon-rhodes' dependencies = [] files = [] hgrepos = [] issue_num = 21095 keywords = [] message_count = 3.0 messages = ['215112', '220637', '220646'] nosy_count = 3.0 nosy_names = ['barry', 'r.david.murray', 'brandon-rhodes'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue21095' versions = ['Python 3.4'] ```

7f693efa-c293-4e08-80e6-4c8268e300a5 commented 10 years ago

Currently, the new wonderful EmailMessage class ignores the encoding specified in any Header objects that are provided to it.

import email.message, email.header
m = email.message.Message()
m['Subject'] = email.header.Header('Böðvarr'.encode('latin-1'), 'latin-1')
print(m.as_string())

Subject: =?iso-8859-1?q?B=F6=F0varr?=

m = email.message.EmailMessage()
m['Subject'] = email.header.Header('Böðvarr'.encode('latin-1'), 'latin-1')
print(m.as_string())
Traceback (most recent call last):
  ...
TypeError: 'Header' object does not support indexing

If the EmailMessage came to recognize and support Header objects, then Python programmers under specific constraints regarding what encodings their customers' email clients will recognize and support would be able to hand-craft the selection of the correct encoding instead of being forced to either ASCII or UTF-8 with binary as the two predominant choices that EmailMessage makes on its own.

83d2e70e-e599-4a04-b820-3814bbdb9bef commented 10 years ago

@David can we have your comments please.

bitdancer commented 10 years ago

I have to look at the implementation to remind myself how hard this would be to implement. The goal was to leave Header a legacy API...if you need that level of control, you use the old API. But I can see the functionality argument, and Header *is* a reasonable API for building such a custom header. It may be a while before I have time to take a look at it, though, so if anyone else wants to take a look, feel free :)

One problem is that while the parser does retain the cte of each encoded word, if the header is refolded for any reason the cte is (often? always? I don't remember) ignored because encoded words may be recombined during folding. And if you are creating the header inside a program, that header is going to get refolded on serialization, unless max_line_length is set to 0/None or the header fits on one line.

So it's not obvious to me that this can work at all. What *could* work would be to have a policy setting to use something other than utf-8 for the CTE for encoding headers, but that would be a global setting (applying to all headers that are refolded during serialization).

Basically, controlling the CTE of encoded words on an individual basis goes directly against the model used by the new Email API: in that model, the "model" of the email message is the *decoded version of the message, and serialization is responsible for doing whatever CTE encoding is appropriate. The goal is to *hide the details of the RFCs from the library user. So, if you want control at that level, you have to go back to the old API, which required you do understand what you were doing at the RFC level...