python / cpython

The Python programming language
https://www.python.org
Other
62.46k stars 29.98k forks source link

email.utils.make_msgid return ids that break email messages with related content #100293

Open ostefano opened 1 year ago

ostefano commented 1 year ago

Bug report

I have been trying to replicate the examples listed here: https://docs.python.org/3/library/email.examples.html

For some reason the one about "creating an HTML message with an alternative plain text version" is assembling an email message that Thunderbird (and other email readers) does not display correctly, as images are not displayed and marked as broken.

The example uses make_msgid() to generate content ids.

Python 3.10.9 (main, Dec  7 2022, 03:14:04) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import utils
>>> utils.make_msgid()
'<167119948916.50921.14529814791249370642@1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa>'
>>>

Turns out that for some reason the string is too long, because if I either remove the domain part or purportedly shorten it, e.g., make_msgid(domain="0.0.0.ip6.arpa"), then everything works again and the resulting email can be correctly displayed in Thunderbird/Outlook.

Your environment

Linked PRs

sobolevn commented 1 year ago

make_msgid by default uses socket.getfqdn() to get the domain part. For my machine it is short enough. So, you have two options:

  1. Change your hostname
  2. Use explicit domain name

I don't think that there's anything we can do from our side.

ostefano commented 1 year ago

@sobolevn I am perfectly fine implementing that workaround in my code. The problem is that this issue is not documented at all, and users reading the official documentation here https://docs.python.org/3/library/email.examples.html might try to implement the example and found themselves completely stumped.

I think we should at least add what you say in your reply to the documentation page linked above. What do you think?

sobolevn commented 1 year ago

Looks like it is documented here: https://docs.python.org/3/library/email.utils.html?highlight=make_msgid#email.utils.make_msgid

I don't think that adding implementation details of make_msgid to the multi-alternatives example is a good idea.

However, making docs better is always a good thing, so - if you have some specific suggestions, please feel free to post them! :)

ostefano commented 1 year ago

What about something like: "Note that modern email clients might not display correctly emails containing resources with message-id longer than XX characters" ?

dtrodrigues commented 1 year ago

While Thunderbird doesn't display messages with a long msgid correctly, Apple Mail does. Which other email clients are not working?

ostefano commented 1 year ago

Outlook 365 (latest on the stable channel)

sobolevn commented 1 year ago

Something like "Note that some email clients might not correctly display emails containing resources with long Message-Id, which usually happens due to the long domain part" sounds like a reasonable note to add! πŸ‘

ostefano commented 1 year ago

@sobolevn πŸ‘ If you point me to the right documentation file, I'd be happy to create the PR.

sobolevn commented 1 year ago

Here you go! https://github.com/python/cpython/blob/main/Doc/library/email.utils.rst

dtrodrigues commented 1 year ago

FWIW, the Thunderbird bug report is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1612465

The longer domain is causing python to encode the Content-ID value to split it across multiple lines, but Thunderbird doesn't seem to support that part of the spec.

ostefano commented 1 year ago

@sobolevn done πŸ‘

bitdancer commented 1 year ago

At the risk of muddying the waters, I think this is actually a bug. I don't believe message-id headers are technically allowed to be encoded using encoded words. The spec is pretty clear that addr-specs are not to be rfc 2047 encoded, and a message-id is composed of addr-spec like things. More directly on point, it is a structured field and its contents is not a phrase. The email package should really probably default to not doing encoding except where it is permitted...instead I went with preventing it on demand (encode_as_ew = False, but the default is True). I believe I did that because X-headers can contain encoded words, and I wanted doing such encoding of X-headers to be the default. I think now that was an incorrect design decision, as it has resulted in several bug reports like this one, including one, if I recall correctly, that was an X-header.

Now, I could be wrong about encoding of message-id headers. After all, I was much more cognizant of the RFCs when I was writing the code than I am now, years later ;)

If I'm right this raises the question of how you comply with the RFC line length requirements while also not using encoded words. The answer, I think, is that you don't. Long lines are handled correctly by far more mail clients than encoding-where-it-doesn't-belong is.

ostefano commented 1 year ago

@sobolevn @bitdancer what is the consensus here? Shall we merge the PR in the meanwhile?

blaisep commented 4 months ago

Also there is a related doc PR https://github.com/python/cpython/pull/100856