Open fsc-eriker opened 9 months ago
A crash is when the process stops without finishing or raising an exception. Stopping on uncaught exceptions like this is normal behavior. Whether this exception is a bug or a feature change request depends on what, if anything, the doc specifies. If you want to claim 'bug' please find and report a violated specification. Without reading the email doc, I can imagine that users are expected to catch such input errors and handle them however they want -- ignore the email, send a rejection to the sender if possible, or display with a modified header -- and that there are users depending on catching this exception.
Sorry if my wording is imprecise. I'll be happy to reclassify this as a feature request. The problem, as I see it, is that the current architecture allows for no user-level control over this behavior; there is no way to proceed with a message with this type of defect. My perception was that the mechanism to register defects was implemented precisely so that users of the library can decide for themselves how exactly to handle any specific email message defects.
Subjectively, I see a lot of scenarios where users (often with limited knowledge of email) want to use Python to process some messages, and get a traceback which they have no way to avoid. A simple and common use case is to extract attachments in bulk; you don't really care if the address of the sender was valid, you just want to loop over all messages, and for each find whether it contains an attachment, and if so extract it.
The raise
on line 33 of headerregistry.py
is quite deliberate and unconditional. However, for another recent bug report about what seems to me like a rather similar problem, I got the feedback that the problem should be handled by registering a defect, not by raising an exception, which is how I had originally tackled it in #108133
I'd appreciate additional guidance on how to proceed with this. I can see three possibilities;
raise
something more specific than a general ValueError
so that it can be caught reliably and precisely in the calling code
For all of these, an obvious complication is that they introduce a new convention, and then the rest of the code should be adapted to obey the same convention. To quickly estimate the size of the problem,
$ grep -Fworc raise Lib/email | grep -Fve .py:0 -e __pycache__
Lib/email/contentmanager.py:9
Lib/email/_policybase.py:10
Lib/email/header.py:4
Lib/email/_encoded_words.py:1
Lib/email/_header_value_parser.py:53
Lib/email/policy.py:1
Lib/email/message.py:12
Lib/email/generator.py:1
Lib/email/utils.py:3
Lib/email/charset.py:2
Lib/email/quoprimime.py:1
Lib/email/mime/message.py:1
Lib/email/mime/application.py:1
Lib/email/mime/nonmultipart.py:1
Lib/email/mime/audio.py:1
Lib/email/mime/image.py:1
Lib/email/feedparser.py:3
Lib/email/architecture.rst:0
Lib/email/headerregistry.py:7
Bug report
Bug description:
I came across some messages where the sender had embedded a newline in the
From:
header's display string. This was (ostensibly) a legitimate sender, possibly hoping to get more real estate for their message in the recipient's inbox listing or something, or just making a configuration mistake.Unfortunately, this crashes Python's
email
parser withpolicy=default
. (The legacy parser works fine, simply because it doesn't attempt to unpeel RFC2047 encoding by default.)While the error message is correct, it should probably not raise an exception; perhaps instead register a defect?
I have tested this on 3.11 out of the box and with the current sources from the cpython Github repo; but I would expect it to manifest on all versions of the modern
email
module and all platforms.CPython versions tested on:
3.11
Operating systems tested on:
macOS