zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
458 stars 58 forks source link

iconv(): Wrong charset, conversion from '3' to 'UCS-4LE' is not allowed #177

Open mvandeweerd opened 3 years ago

mvandeweerd commented 3 years ago

Hi,

Thank you for this package. We have stumbled upon some cases were we get fatal exceptions due to charset conversion errors. We fetch the RAW email from either the Gmail or Outlook API and pass that to the from() method:

$message = Message::from($rawEmailFromGmail);

Whenever we run $message->getHtmlContent() we sometimes run into conversion errors as seen in the issue subject:

ErrorException: iconv_strlen(): Wrong charset, conversion from3' to UCS-4LE' is not allowed in /var/www/trengoweb/envoyer/releases/20210805205144/vendor/zbateson/mb-wrapper/src/MbWrapper.php:397

It seems that 3 is not supported by iconv, but it is odd that it still tries to convert it. Is this a bug? And is wrapping getHtmlContent() in a try/catch block and retrying it with setCharsetOverride('UTF-8) a proper fix?

Thanks again!

Kind regards,

Marcel

zbateson commented 3 years ago

Hi Marcel,

The problem is there's no way of determining beforehand if a charset is supported by iconv. Your solution's good though... I should either document that behaviour or do that myself in MbWrapper when calling iconv and treat it as ISO-8859-1 (or UTF-8 even).

If I do it on MbWrapper it still might cause issues. A valid but unsupported charset for instance, or a wrong assumption about it being UTF-8... and then also there's no error reported or exception thrown. One of my goals though is for the project to mostly 'just work' and avoid exceptions cause they're not always helpful.

I'll give it some thought, thanks for reporting :)

Zaahid

mvandeweerd commented 3 years ago

Well, I think there are possibilities by using the bash command iconv -l and parsing that with PHP, but that's not ideal and might give permission errors. So far the try/catch solution has worked out really well for us. We just fallback to UTF-8 for now.

We had no issues with it yet, but yes you are right. This might cause unexpected cases, I think it's no big deal since these charsets are weird/invalid anyway.

Always happy to think along!

Marcel

zbateson commented 3 years ago

That's good to know that's working out for you.

It actually might work out for me too come to think of it -- fallback to UTF-8, and try iconv again... if it throws another error, that can go back to the user.