python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output. #83952

Open e3724f42-2d24-43a9-b6a3-a63cc57f185f opened 4 years ago

e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago
BPO 39771
Nosy @warsaw, @bitdancer, @dorosch
Files
  • email bug.rar
  • foxmail_screenshot.jpeg
  • outlook_screenshot.jpeg
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-feature', '3.7', 'expert-email'] title = 'EmailMessage may need to support RFC-non-compliant MIME parameter encoding (encoded words in quotes) for output.' updated_at = user = 'https://bugs.python.org/hwgdbSmith' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = False closed_date = None closer = None components = ['email'] creation = creator = 'hwgdb Smith' dependencies = [] files = ['48920', '48924', '48925'] hgrepos = [] issue_num = 39771 keywords = [] message_count = 17.0 messages = ['362780', '362781', '362792', '362804', '362805', '362806', '362808', '362814', '362836', '362857', '362858', '362903', '362921', '362922', '362924', '362927', '362991'] nosy_count = 4.0 nosy_names = ['barry', 'r.david.murray', 'dorosch', 'hwgdb Smith'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue39771' versions = ['Python 3.7', 'Python 3.8'] ```

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    here is the partial code: msg = EmailMessage() file_name = "超e保3000P.csv" ctype, encoding = mimetypes.guess_type(file_name) if ctype is None or encoding is not None: ctype = "application/octet-stream" maintype, subtype = ctype.split("/", 1)

        with open(file_name, "rb") as f:
            msg.add_attachment(f.read(), maintype=maintype, subtype=subtype, filename=("GBK", "", f"{file_name}"))

    The file has non-ascii characters name, so I use the three tuple filename with encode GBK, but msg.as_string() doesn't change. print(msg.as_string()) I find the filename is 'filename*=utf-8\'\'%E8%B6 ......'. The encoding is not correct. And of course, after send the message, I saw the attached file's filename displayed incorrect on my mail client or web mail. But when i use the legacy API, and using the Header class to generate the filename, it works.

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    "but msg.as_string() doesn't change. " , I mean using

      filename=file_name  
    or
      filename=("GBK", "", f"{file_name}")
    or
      filename=("utf-8", "", f"{file_name}")

    msg.as_string() doesn't change.

    c61e5b5d-a42c-408b-b187-a9e19efb2665 commented 4 years ago

    Hello, could you please attach minimal-work file for reproduce it?

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    I have uploaded just now. Thank you.

    bitdancer commented 4 years ago

    I think you are saying that you want the charset in the encoded filename to be GBK rather than utf-8? utf-8 should certainly display correctly in your email client, though, so if it is not there is something else going wrong.

    As far as the 3 tuple not working to set the charset...I believe what is happening there is that a header created by the application gets "refolded" on serialization, and refolding doesn't keep the existing charset, it converts everything to utf-8. This is an intentional part of the design: the library handles the gory details of MIME and uses utf-8 as the charset for application created content. It is actually an accident of the implementation that the tuple form of the filename is even accepted; you will note that it is *not* documented in the contentmanager docs.

    It wouldn't be crazy to ask for this as a feature, and it could even be treated as a bug that it doesn't work if we want to, but it may not be easy to "fix", because it goes against the design philosophy of the new API.

    bitdancer commented 4 years ago

    Actually, given that the contentmanager does accept a charset parameter for text content, it does seem reasonable to treat this as a bug. But as I said fixing it may not be trivial.

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    Using utf-8 doesn't display correctly on my mail client. So i thought it might work using GBK, and I try to change the Content-Disposition filename using GBK. And just now, I print the legacy Api MIMEMultipart.as_string(), I found it using utf-8 too. The difference is legacy api: filename="=?utf-8?b?6LaFZeS/nTMwMDBQLmNzdg==?="
    EmailMessage: filename*=utf-8''.%2F%E8%B6%85e%E4%BF%9D3000P.csv

    So it is not the encoding cause the display error. But I still don't know why? Base64?

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    Why there are two different representations of the same file name? It displays incorrectly when use the EmailMessage API filename representation.

    bitdancer commented 4 years ago

    The legacy API appears to be using an RFC-incorrect (but common) encoded-word encoding, while the new API is using the RFC-compliant MIME-parameter encoding (% encoding). Which email client are you using?

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    Microsoft outlook 20116 MSO(16.0.4266.10001) x64 Foxmail 7.2 (build 7.026)

    I use these two email client. All display incorrectly. And I have uploaded the screenshot.

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    Microsoft outlook 2016 MSO(16.0.4266.10001) x64

    bitdancer commented 4 years ago

    Since Outlook is one of the mailers that generates the non-RFC-compliant headers, it doesn't surprise me all that much that it can't interpret the RFC compliant headers correctly.

    I'm not sure there is anything we can do here.

    I suppose someone could do a survey of mail clients and document which ones can handle which style of parameter encoding. If it turns out more handle the "wrong" way than handle the "right" way, we could consider adopting to the de-facto standard, although I won't like it much :)

    (There is also a possibility there is a bug in our RFC compliance, but this is the first problem report I've seen.)

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    I think program's goal is to solve problem, not solve the "standard".

    OK, if you insist that "standard" has the Top priority, could you please tell me a way to change the default behavior of the new api to use the "=?utf-8?b?" parameter style. Is there a function or parameter i can use to achieve this?

    If not, i think the best way to solve it is to add a "param style" parameter that i can choose which style i use.

    And if not, i am sad about this, i had to use the legacy api.

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    https://litmus.com/blog/infographic-the-2019-email-client-market-share

    And there is a survey about email client market share. You see outlook is top 3.

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    And i just send a mail to my Gmail. I view it using web, it is incorrectly!

    e3724f42-2d24-43a9-b6a3-a63cc57f185f commented 4 years ago

    Sorry, the Gmail web is correctly.

    bitdancer commented 4 years ago

    I actually agree: if most (by market share) MUAs handle the RFC-incorrect parameter encoding style, and a significant portion does not handle the RFC correct style, then we should support the de-facto standard rather than the official standard as the default. I just wish Microsoft would write better software :) If on the other hand it is only microsoft out of the big market share players that is broken, I'm not sure I'd want it to be the default. But we could still support it optionally.

    So yeah, we could have a policy control that governs which one is actually used.

    So this is a feature request, and ideally should be supported by an investigation of what MUAs support what, by market share. And there's another question: does this only affect the filename parameter, or is it all MIME parameters? I would expect it to be the latter, but someone should check at least a few examples of that to be sure.