microsoft / MHA

Message Header Analyzer Add-in For Outlook
MIT License
282 stars 50 forks source link

Avoid processing the body as headers #942

Closed oh2fih closed 5 months ago

oh2fih commented 5 months ago

Issue

Someone might paste an entire message with its body to the inputHeaders textarea when using standalone Message Header Analyzer. This causes incorrect processing of the body as headers with the rules for the headers, producing nonsensical rows in the table. Furthermore, on large messages the entire browser can freeze or crash.

Examples

For example, the Multipart Media Type (RFC 2046, 5.1) boundary delimiter from the Content-Type header (14) appears as part of the last header (X-SES-Outgoing, 17) and the rest are not headers.

multipart body processed as headers

In another example, HTML message body is interpreted as headers.

html body processed as headers

Solution

This pull request adds a feature for recognizing the boundary between the header section & the body as defined in RFC 5322, 2.1.

A message consists of header fields (collectively called "the header section of the message") followed, optionally, by a body. The header section is a sequence of lines of characters with special syntax as defined in this specification. The body is simply a sequence of characters that follows the header section and is separated from the header section by an empty line (i.e., a line with nothing preceding the CRLF).

oh2fih commented 5 months ago

@microsoft-github-policy-service agree

oh2fih commented 5 months ago

The second commit f186727 increases performance. HeaderModel is now able to process 7 MB messages in milliseconds.