Open e3724f42-2d24-43a9-b6a3-a63cc57f185f opened 4 years ago
here is the partial code: msg = EmailMessage() file_name = "超e保3000P.csv" ctype, encoding = mimetypes.guess_type(file_name) if ctype is None or encoding is not None: ctype = "application/octet-stream" maintype, subtype = ctype.split("/", 1)
with open(file_name, "rb") as f:
msg.add_attachment(f.read(), maintype=maintype, subtype=subtype, filename=("GBK", "", f"{file_name}"))
The file has non-ascii characters name, so I use the three tuple filename with encode GBK, but msg.as_string() doesn't change. print(msg.as_string()) I find the filename is 'filename*=utf-8\'\'%E8%B6 ......'. The encoding is not correct. And of course, after send the message, I saw the attached file's filename displayed incorrect on my mail client or web mail. But when i use the legacy API, and using the Header class to generate the filename, it works.
"but msg.as_string() doesn't change. " , I mean using
filename=file_name
or
filename=("GBK", "", f"{file_name}")
or
filename=("utf-8", "", f"{file_name}")
msg.as_string() doesn't change.
Hello, could you please attach minimal-work file for reproduce it?
I have uploaded just now. Thank you.
I think you are saying that you want the charset in the encoded filename to be GBK rather than utf-8? utf-8 should certainly display correctly in your email client, though, so if it is not there is something else going wrong.
As far as the 3 tuple not working to set the charset...I believe what is happening there is that a header created by the application gets "refolded" on serialization, and refolding doesn't keep the existing charset, it converts everything to utf-8. This is an intentional part of the design: the library handles the gory details of MIME and uses utf-8 as the charset for application created content. It is actually an accident of the implementation that the tuple form of the filename is even accepted; you will note that it is *not* documented in the contentmanager docs.
It wouldn't be crazy to ask for this as a feature, and it could even be treated as a bug that it doesn't work if we want to, but it may not be easy to "fix", because it goes against the design philosophy of the new API.
Actually, given that the contentmanager does accept a charset parameter for text content, it does seem reasonable to treat this as a bug. But as I said fixing it may not be trivial.
Using utf-8 doesn't display correctly on my mail client.
So i thought it might work using GBK, and I try to change the Content-Disposition filename using GBK.
And just now, I print the legacy Api MIMEMultipart.as_string(), I found it using utf-8 too. The difference is
legacy api: filename="=?utf-8?b?6LaFZeS/nTMwMDBQLmNzdg==?="
EmailMessage: filename*=utf-8''.%2F%E8%B6%85e%E4%BF%9D3000P.csv
So it is not the encoding cause the display error. But I still don't know why? Base64?
Why there are two different representations of the same file name? It displays incorrectly when use the EmailMessage API filename representation.
The legacy API appears to be using an RFC-incorrect (but common) encoded-word encoding, while the new API is using the RFC-compliant MIME-parameter encoding (% encoding). Which email client are you using?
Microsoft outlook 20116 MSO(16.0.4266.10001) x64 Foxmail 7.2 (build 7.026)
I use these two email client. All display incorrectly. And I have uploaded the screenshot.
Microsoft outlook 2016 MSO(16.0.4266.10001) x64
Since Outlook is one of the mailers that generates the non-RFC-compliant headers, it doesn't surprise me all that much that it can't interpret the RFC compliant headers correctly.
I'm not sure there is anything we can do here.
I suppose someone could do a survey of mail clients and document which ones can handle which style of parameter encoding. If it turns out more handle the "wrong" way than handle the "right" way, we could consider adopting to the de-facto standard, although I won't like it much :)
(There is also a possibility there is a bug in our RFC compliance, but this is the first problem report I've seen.)
I think program's goal is to solve problem, not solve the "standard".
OK, if you insist that "standard" has the Top priority, could you please tell me a way to change the default behavior of the new api to use the "=?utf-8?b?" parameter style. Is there a function or parameter i can use to achieve this?
If not, i think the best way to solve it is to add a "param style" parameter that i can choose which style i use.
And if not, i am sad about this, i had to use the legacy api.
https://litmus.com/blog/infographic-the-2019-email-client-market-share
And there is a survey about email client market share. You see outlook is top 3.
And i just send a mail to my Gmail. I view it using web, it is incorrectly!
Sorry, the Gmail web is correctly.
I actually agree: if most (by market share) MUAs handle the RFC-incorrect parameter encoding style, and a significant portion does not handle the RFC correct style, then we should support the de-facto standard rather than the official standard as the default. I just wish Microsoft would write better software :) If on the other hand it is only microsoft out of the big market share players that is broken, I'm not sure I'd want it to be the default. But we could still support it optionally.
So yeah, we could have a policy control that governs which one is actually used.
So this is a feature request, and ideally should be supported by an investigation of what MUAs support what, by market share. And there's another question: does this only affect the filename parameter, or is it all MIME parameters? I would expect it to be the latter, but someone should check at least a few examples of that to be sure.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.8', 'type-feature', '3.7', 'expert-email']
title = 'EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output.'
updated_at =
user = 'https://bugs.python.org/hwgdbSmith'
```
bugs.python.org fields:
```python
activity =
actor = 'r.david.murray'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['email']
creation =
creator = 'hwgdb Smith'
dependencies = []
files = ['48920', '48924', '48925']
hgrepos = []
issue_num = 39771
keywords = []
message_count = 17.0
messages = ['362780', '362781', '362792', '362804', '362805', '362806', '362808', '362814', '362836', '362857', '362858', '362903', '362921', '362922', '362924', '362927', '362991']
nosy_count = 4.0
nosy_names = ['barry', 'r.david.murray', 'dorosch', 'hwgdb Smith']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue39771'
versions = ['Python 3.7', 'Python 3.8']
```