zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
458 stars 58 forks source link

Parsing content with 'mbstring.func_overload' set to 2 #108

Closed h4mpy closed 1 year ago

h4mpy commented 4 years ago

Parsing non-latin html messages gives unexpected results. For example:

//Function to create test MIME Message
function getMess($text) {
return 'From: support@test.ru
Date: Tue, 11 Feb 2020 13:00:06 +0300
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="-------alt7135e427b2"
Subject: =?UTF-8?B?0KLQtdC80LDRgtC10LzQsA==?=

---------alt7135e427b2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

'.$text.'

---------alt7135e427b2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<b>'.$text.'</b>

---------alt7135e427b2--';
}

Latin subset: everything is ok

$message = \ZBateson\MailMimeParser\Message::from(getMess('Test'));
echo $message->getTextContent();
echo $message->getHtmlContent();

Result:

Test
<b>Test</b>

Cyrillic and Japanese: unexpected extra lines, copying the last characters

$message = \ZBateson\MailMimeParser\Message::from(getMess('Текст'))
echo $message->getTextContent();
echo $message->getHtmlContent();

Result:

Текст
 ?т
<b>Текст</b>
/b>
$message = \ZBateson\MailMimeParser\Message::from(getMess('テキスト'));
echo $message->getTextContent();
echo $message->getHtmlContent();

Result:

テキスト
スト
??
<b>テキスト</b>
??</b>

What could be causing the problem? Checked on PHP 7.3.13

zbateson commented 4 years ago

Hi @h4mpy --

Thanks for the detailed report... unfortunately I'm unable to reproduce what you're experiencing, but that might just be the environment I'm running... this is what I'm getting copy/pasting your tests:

Test

<b>Test</b>

Текст

<b>Текст</b>

テキスト

<b>テキスト</b>

I tried on php 7.4 and 7.2 since I have them installed.

What happens if you run all the mail-mime-parser tests? If you 'composer install' under mail-mime-parser, you can then run php ./vendor/bin/phpunit -c ./tests/phpunit.xml. You could also try the same under zbateson/stream-decorators.

h4mpy commented 4 years ago

problem is caused by php setting mbstring.func_overload = 2 this setting is necessary for cms we use is it possible for the parser to work with this setting?

zbateson commented 4 years ago

Hi @h4mpy

Someone had requested that in the past and I'd reworked the code to make it work. Unfortunately it seems when I split out the code into separate libraries I lost that. It shouldn't be too difficult to restore though. I might look into adding some tests to make sure it doesn't break in the future as well.

zbateson commented 4 years ago

I had a look at this and unfortunately it may not be possible to rework this using guzzlehttp/psr7. I may be wrong and I still need to take a deeper look, but that was first possible hurdle.