zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
442 stars 56 forks source link

Weird multiline quoted "From" cannot be parsed correctly #161

Open markusramsak opened 3 years ago

markusramsak commented 3 years ago

The "From: " part of this simplified email cannot be parsed correctly. The result should be: Holger_Akademie für Impulsgebung <test@gmail.com>

From: =?UTF-8?Q?"Holger=5FAkademie_f=C3=BCr_Impulsge?=
 =?UTF-8?Q?bung"_<test@gmail.com>?=
To: <to@gmail.com>
Subject: Nach dem Meisterkreis am 28. April 2016
Date: Thu, 28 Apr 2016 22:05:47 +0200
Message-ID: <000e01d1a189$6616b250$32d4d416f0$@gmail.com>
zbateson commented 3 years ago

Hi @markusramsak --

This one too is invalid. It fails on two parts:

  1. Semantic parts of a header need to be outside the encoded parts, the encoded parts can encode only within them generally. There is an exception that needs to be made for References/Content-Id headers. See #109 for info on that, but particularly this from RFC 1342:

An encoded-word may replace a "text" token (as defined by RFC 822) in: (1) a Subject or Comments header field, (2) any extension message header field, (3) any user-defined message header field, or (4) any RFC 1341 body part header field (such as Content-Description) for which the field body contains only "text"s.

That means for example, this is a commented part: (=?UTF-8?=Q?blah?=) but this is not: =?UTF-8?=Q?(blah)?= .

  1. For email addresses specifically, they may not be mime-header encoded anyway... my processing may allow it if it's just part of an address (it may not, I can't remember off-hand) but it's not allowed by the RFC: "An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'." in https://tools.ietf.org/html/rfc2047#section-5

Again, I welcome discussion/examples of other handling, etc...

markusramsak commented 3 years ago

I believe you but these is a part of an real email where this case happenend. I understand if you don't want to handle these cases but then I would try to handle these cases. These cases are rare but they happen.

zbateson commented 3 years ago

This would also be fixed by prioritizing the mime-encoded part over the quoted part like #159 if it makes sense to do so (should be investigated to see impact/usefulness).