Open calpaterson opened 1 month ago
Bug report
Bug description:
When parsing back a written email, whitespace seems to be prepended to the header if the header was wrapped upon writing.
This is particularly noticeable for message-ids, which end up different - with either a space or a newline prepended depending on what policy is set to (compat32: newline, default: space).
import string from email import message_from_bytes from email.message import EmailMessage import email.policy orig = EmailMessage() orig["Message-ID"] = string.ascii_lowercase * 3 policy = email.policy.default # changing to compat32 emits a different error parsed = message_from_bytes(orig.as_bytes(policy=policy), policy=policy) assert ( parsed["Message-ID"] == orig["Message-ID"] ), f"message ids don't match: '{orig['Message-ID']}' != '{parsed['Message-ID']}'"
I'm not very familiar with RFC2822, but based on the rules it includes for "long" header fields, the written email bytes look right to me, it's just when it's being read back it's not right.
CPython versions tested on:
3.9, 3.12
Operating systems tested on:
Linux
You may be referring to RFC822
. But the above behavior is indeed wrong. Maybe you can add .strip(' ')
after the parsing process.
Well, there are two problems here. One is the wrapping on serialization. The original design was supposed to be that when the word was too long to fit within the maxlength limit, encoded words would be used to do the wrapping. Not sure whether that was a good choice or not, or why it isn't happening here, unless someone "fixed" that design? So, two choices, either make it so the longer-than-maxlength word doesn't cause wrapping, or fix it so that encoded words are used and the line gets wrapped to fit correctly within maxline.
However, that is not the bug you are addressing here, so it should go into another issue if you want to open one. (You could also just ignore it).
Then, there is the parsing problem. That leading space on the next line is supposed to be treated as if it were the space between the ':' and the body of the header. As I noted on the PR review, the problem is that I failed to include newline and carriage return as part of the whitespace to be stripped from the start of the value.
Now I remember. There was a previous bug where long message ids were getting encoded using encoded words, which is not legal per the rfc. We fixed that bug, but didn't deal with the long-word-gets-moved-to-next-line bug at that time.
Bug report
Bug description:
When parsing back a written email, whitespace seems to be prepended to the header if the header was wrapped upon writing.
This is particularly noticeable for message-ids, which end up different - with either a space or a newline prepended depending on what policy is set to (compat32: newline, default: space).
I'm not very familiar with RFC2822, but based on the rules it includes for "long" header fields, the written email bytes look right to me, it's just when it's being read back it's not right.
CPython versions tested on:
3.9, 3.12
Operating systems tested on:
Linux
Linked PRs