zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
458 stars 58 forks source link

Lack of RFC 1342 support for IdHeader #109

Closed WaylandAce closed 4 years ago

WaylandAce commented 4 years ago

https://tools.ietf.org/html/rfc1342

Example header:

References: =?us-ascii?Q?<CACrVqsLQjPe0y=3DE4q0auFowDoY+9Z27R63OA=5F1fn-?= =?us-ascii?Q?mGPG9Zc3Q@example.com>_<a1527a80a42422457ebe?= =?us-ascii?Q?89657a5d0e89@example.com>?=

Expecting result: getIds()->getValue() method should return decoded values. Actual result: getIds()->getValue() method return encoded values.

zbateson commented 4 years ago

Eek, is this something you actually encountered? It's weird because the control characters are part of the encoded RFC1342 parts as well. It definitely doesn't follow RFC 1342 being there from my understanding:

An encoded-word may replace a "text" token (as defined by RFC 822) in: (1) a Subject or Comments header field, (2) any extension message header field, (3) any user-defined message header field, or (4) any RFC 1341 body part header field (such as Content-Description) for which the field body contains only "text"s.

And from RFC-2822 (3.6.4, Identification Fields), defines the header as:

message-id = "Message-ID:" msg-id CRLF in-reply-to = "In-Reply-To:" 1msg-id CRLF references = "References:" 1msg-id CRLF msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS] id-left = dot-atom-text / no-fold-quote / obs-id-left id-right = dot-atom-text / no-fold-literal / obs-id-right no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE

And earlier:

atext = ALPHA / DIGIT / ; Any character except controls, "!" / "#" / ; SP, and specials. "$" / "%" / ; Used for atoms "&" / "'" / "" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" atom = [CFWS] 1atext [CFWS] dot-atom = [CFWS] dot-atom-text [CFWS] dot-atom-text = 1atext ("." 1*atext)

Now, that being the case -- if you can confirm this exists in the wild (and not just something you thought of) I'd be inclined to support it at some point if it's relatively common, and if I can rework my code to do so... this one may be particularly annoying though, because normally even if mime-encoded parts are allowed, they're usually outside of the control characters -- so <=?us-ascii?Q?..... (with the < first, not =?us-ascii?Q?<).

WaylandAce commented 4 years ago

Yes, I got such headers from existing raw message, what was read from gmail (through gmail api). As workaround I decoded these headers (and in-reply-to header as well) with mb_decode_mimeheader.

WaylandAce commented 4 years ago

I made a small example: https://github.com/WaylandAce/phpmailer-rfc-test/blob/master/index.php

phpmailer + gmail message-id gives such result.

zbateson commented 4 years ago

In that case though, it looks to me like the problem is in setting up the References header using phpmailer. The example calls 'addCustomHeader', so phpmailer isn't doing something special with the header, it just thinks it should encode the whole string (maybe because it sees the <> characters? Or maybe it just does that by default?) Maybe there's a way to set the header with phpmailer and not force encoding it, which would mean the References header would then be standards-compliant.

It doesn't convince me that this exists commonly, but I'm happy to keep this open for yourself/others to add on to this, +chime in if it does indeed exist like that in the wild and something needs to be done.

WaylandAce commented 4 years ago

https://github.com/PHPMailer/PHPMailer/issues/1876

WaylandAce commented 4 years ago

And how about first advantage of this library?

Handles header decoding/charset/formats for you. No need to worry about the format a header is in

zbateson commented 4 years ago

Thanks for the link, that confirms it to me. Yeah, I'm not against adding it, just wanted to establish it exists in the wild that way. The issue you posted also mentions apple mail doing it that way and has a good discussion about it generally.

zbateson commented 4 years ago

Fixed in master, will release in a few days unless an issue is reported.

zbateson commented 4 years ago

Released in 1.2.1