zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
458 stars 58 forks source link

References header decoding problem #186

Closed jzernovic closed 2 years ago

jzernovic commented 2 years ago

Hi, we are handling lots of email traffic and so far this library is handling it great.

But from time to time we get email with References header that is not properly parsed into parts. For example:

References: =?us-ascii?Q?
 <86c6f658-a49a-709a-5089-75c73560128b@local.test>_<d605175a-1cee-9f20-bc24-79346cc7f965@local.test>_<4cb9afdf-6179-d736-a733-4b42f4c3e58a@local.test>_<93C197EB-7FF9-41E6-8034-E709EA8510B7@local.test>_<6e118fee-dcae-7a52-f88b-c6c4e69a2831@local.test>_<FC8F3C9A-313D-4C98-9357-A0D84B2C884B@local.test>_<91aebbab-6de8-2d6f-4fd6-e619226c8f3b@local.test>_<3269716E-85B0-4CB9-A678-36C502F874E3@local.test>_<37904a2b-70fd-6628-b988-650a21101b8f@local.test>_<B557D94A-737D-487B-B929-C20739A82D75@local.test>_<6af1f376-b0f3-482e-d1bd-1e3c8e50d550@local.test>_<2A159EE9-26D4-47A0-BA89-FFF8B0CE979B@local.test>_<7afxgxia=5F2336078@local.test>_<924664B4-3BB4-415F-A288-0A6988C66691@local.test>_<7afxgxia=5F2337966@local.test>_<7afxgxia=5F2344378@local.test>_<a5e707e5-40c2-9883-fd26-8ee754c54eaf@local.test>_<07953468-8d82-1f73-6f61-597b0070c561@local.test>_<AC78E27F-8102-45E7-8324-26767B8F8E01@local.test>_<7afxgxia=5F2404566@local.test>_<71A6D4AB-C1A8-4F4A-99DA-55F6CC56DF8F@local.test>_<504FD9
 B2-E05?= =?us-ascii?Q?6-4971-A722-953664BEFB5F@local.test>?=
 <2786730_7afxgxia@local.test>

When handled by IdHeader class we get parts:

  1. =?us-ascii?Q? <86c6 ... st>_<504FD9 B2-E05?=
  2. =?us-ascii?Q?6-4971-A722-953664BEFB5F@local.test>?=
  3. 2786730_7afxgxia@local.test

If I cheat somewhat and pass the value first to MimeLiteralPart class and then feed the decoded value to IdHeader class I get all the parts (the second from the end contains space, but that is problem of the client that sent it that way).

Is there a possibility that MIME decoding can be eased a bit in the IdHeader class, getting it somewhat closer to how it is handled by MimeLiteralPart class?

zbateson commented 2 years ago

Hi @jzernovic --

What version of the parser are you using? That should be fixed since 1.2.1. See #109 and this (passing) test:

https://github.com/zbateson/mail-mime-parser/blob/b969a8a72106dcdaa9ac4b19bb2e6b22d3fc5584/tests/MailMimeParser/Header/IdHeaderTest.php#L80-L101

jzernovic commented 2 years ago

Tested on latest version: 2.1.1

Our composer has: "zbateson/mail-mime-parser": "^2.0"

A piece of code I put together (in hurry) to test it:

        $source = <<<source
=?us-ascii?Q?
 <86c6f658-a49a-709a-5089-75c73560128b@local.test>_<d605175a-1cee-9f20-bc24-79346cc7f965@local.test>_<4cb9afdf-6179-d736-a733-4b42f4c3e58a@local.test>_<93C197EB-7FF9-41E6-8034-E709EA8510B7@local.test>_<6e118fee-dcae-7a52-f88b-c6c4e69a2831@local.test>_<FC8F3C9A-313D-4C98-9357-A0D84B2C884B@local.test>_<91aebbab-6de8-2d6f-4fd6-e619226c8f3b@local.test>_<3269716E-85B0-4CB9-A678-36C502F874E3@local.test>_<37904a2b-70fd-6628-b988-650a21101b8f@local.test>_<B557D94A-737D-487B-B929-C20739A82D75@local.test>_<6af1f376-b0f3-482e-d1bd-1e3c8e50d550@local.test>_<2A159EE9-26D4-47A0-BA89-FFF8B0CE979B@local.test>_<7afxgxia=5F2336078@local.test>_<924664B4-3BB4-415F-A288-0A6988C66691@local.test>_<7afxgxia=5F2337966@local.test>_<7afxgxia=5F2344378@local.test>_<a5e707e5-40c2-9883-fd26-8ee754c54eaf@local.test>_<07953468-8d82-1f73-6f61-597b0070c561@local.test>_<AC78E27F-8102-45E7-8324-26767B8F8E01@local.test>_<7afxgxia=5F2404566@local.test>_<71A6D4AB-C1A8-4F4A-99DA-55F6CC56DF8F@local.test>_<504FD9
 B2-E05?= =?us-ascii?Q?6-4971-A722-953664BEFB5F@local.test>?=
 <2786730_7afxgxia@local.test>
source;

        $parser = new MailMimeParser();
        $message = $parser->parse('References: ' . $source, false);
        $bad = $message->getHeader('References');

        $mbWrapper = new MbWrapper();
        $headerPartFactory = new HeaderPartFactory($mbWrapper);
        $mimeDecoded = $headerPartFactory->newMimeLiteralPart($source);
        $mimeLiteralPartFactory = new MimeLiteralPartFactory($mbWrapper);
        $consumerService = new ConsumerService($headerPartFactory, $mimeLiteralPartFactory);

        $good = (new HeaderFactory($consumerService, $mimeLiteralPartFactory))
            ->newInstance('References', $mimeDecoded->getValue());

will result in: image

zbateson commented 2 years ago

Hi @jzernovic --

For some reason I built it to only mime-decode the header if all parts were mime-encoded, and not when mixed as in your example.

This is fixed in 2.2.0.

All the best.