Open me-and opened 1 year ago
I believe this is the expected behaviour. Per the documentation, HeaderParser
acts like Parser
with headersonly=True
. Modifying the test script as follows, the printed value is [MultipartInvariantViolationDefect()]
.
email_str = '''\
Date: 01 Jan 2001 00:01+0000
From: arthur@example.example
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=autocracy
--autocracy
Content-Type: text/plain
By hanging on to outdated imperialist dogma which perpetuates the economic and
social differences in our society.
--autocracy
Content-Type: text/html
<html><body><p>By hanging on to outdated imperialist dogma which perpetuates
the economic and social differences in our society.</p></body></html>
--autocracy--
'''
full_parser = email.parser.Parser(policy=email.policy.default)
parsed_email_full = full_parser.parsestr(email_str)
print(parsed_email_full.defects) # Prints [] as reported
full_parser = email.parser.Parser(policy=email.policy.default)
parsed_email_full = full_parser.parsestr(email_str, headersonly=True)
print(parsed_email_full.defects) # Prints[MultipartInvariantViolationDefect()]
header_parser = email.parser.HeaderParser(policy=email.policy.default)
parsed_email_headers_only = header_parser.parsestr(email_str)
print(parsed_email_headers_only.defects) # Prints [MultipartInvariantViolationDefect()]
I see the issue, looking into it now.
Per documentation of Parser.parse()
:
Optional
headersonly
is a flag specifying whether to stop parsing after reading the headers or not. The default isFalse
, meaning it parses the entire contents of the file.
From this reading, the issue is valid and the fix in the attached PR is the correct bugfix.
Bug report
A valid multipart email message, when parsed with
email.parser.HeaderParser(policy=email.policy.default)
will record aemail.errors.MultipartInvariantViolationDefect
.If the parser isn't going to attempt to parse the message body, it shouldn't report that as a defect.
Simple test script:
Your environment
Linked PRs