Open 7f693efa-c293-4e08-80e6-4c8268e300a5 opened 10 years ago
Currently, the new wonderful EmailMessage class ignores the encoding specified in any Header objects that are provided to it.
import email.message, email.header
m = email.message.Message()
m['Subject'] = email.header.Header('Böðvarr'.encode('latin-1'), 'latin-1')
print(m.as_string())
Subject: =?iso-8859-1?q?B=F6=F0varr?=
m = email.message.EmailMessage()
m['Subject'] = email.header.Header('Böðvarr'.encode('latin-1'), 'latin-1')
print(m.as_string())
Traceback (most recent call last):
...
TypeError: 'Header' object does not support indexing
If the EmailMessage came to recognize and support Header objects, then Python programmers under specific constraints regarding what encodings their customers' email clients will recognize and support would be able to hand-craft the selection of the correct encoding instead of being forced to either ASCII or UTF-8 with binary as the two predominant choices that EmailMessage makes on its own.
@David can we have your comments please.
I have to look at the implementation to remind myself how hard this would be to implement. The goal was to leave Header a legacy API...if you need that level of control, you use the old API. But I can see the functionality argument, and Header *is* a reasonable API for building such a custom header. It may be a while before I have time to take a look at it, though, so if anyone else wants to take a look, feel free :)
One problem is that while the parser does retain the cte of each encoded word, if the header is refolded for any reason the cte is (often? always? I don't remember) ignored because encoded words may be recombined during folding. And if you are creating the header inside a program, that header is going to get refolded on serialization, unless max_line_length is set to 0/None or the header fits on one line.
So it's not obvious to me that this can work at all. What *could* work would be to have a policy setting to use something other than utf-8 for the CTE for encoding headers, but that would be a global setting (applying to all headers that are refolded during serialization).
Basically, controlling the CTE of encoded words on an individual basis goes directly against the model used by the new Email API: in that model, the "model" of the email message is the *decoded version of the message, and serialization is responsible for doing whatever CTE encoding is appropriate. The goal is to *hide the details of the RFCs from the library user. So, if you want control at that level, you have to go back to the old API, which required you do understand what you were doing at the RFC level...
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['expert-email']
title = 'EmailMessage should support Header objects'
updated_at =
user = 'https://bugs.python.org/brandon-rhodes'
```
bugs.python.org fields:
```python
activity =
actor = 'BreamoreBoy'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['email']
creation =
creator = 'brandon-rhodes'
dependencies = []
files = []
hgrepos = []
issue_num = 21095
keywords = []
message_count = 3.0
messages = ['215112', '220637', '220646']
nosy_count = 3.0
nosy_names = ['barry', 'r.david.murray', 'brandon-rhodes']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue21095'
versions = ['Python 3.4']
```