tchwork / utf8

Portable and performant UTF-8, Unicode and Grapheme Clusters for PHP
Apache License 2.0
627 stars 50 forks source link

Third argument of mb_convert_encoding() can be an array #51

Closed leofeyer closed 9 years ago

leofeyer commented 9 years ago

According to the PHP manual, the third argument of mb_convert_encoding() can be either a comma separated list or an array. The Mbstring class only handles the first case.

nicolas-grekas commented 9 years ago

can you please provide a patch?

leofeyer commented 9 years ago

I have already tried to fix it but it does not seem to be trivial. Here are two examples where the conversion is no problem:

// ISO-8859-1 to UTF-8
mb_convert_encoding(utf8_decode('déjà'), 'UTF-8', 'ISO-8859-1');

// ISO-2022-JP to UTF-8
mb_convert_encoding(mb_convert_encoding('漢字', 'ISO-2022-JP', 'UTF-8'), 'UTF-8', 'ISO-2022-JP');

We are using mb_convert_encoding() to convert file and folders names, which can be encoded in various ways all over the world. Therefore, we do not know the exact charset.

The PHP mbstring extension supports the following in this case:

// <unknown> to UTF-8
mb_convert_encoding($filename, 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');

// the third argument can also be an array
mb_convert_encoding($filename, 'UTF-8', array('ASCII', 'ISO-2022-JP', 'UTF-8', 'EUC-JP', 'ISO-8859-1'));

This still works fine with our two test cases from above:

// ISO-8859-1 to UTF-8
mb_convert_encoding(utf8_decode('déjà'), 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');

// ISO-2022-JP to UTF-8
mb_convert_encoding(mb_convert_encoding('漢字', 'ISO-2022-JP', 'UTF-8'), 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');

But it does not work with the compatibility layer:

1) Patchwork\Tests\PHP\Shim\MbstringTest::testmb_convert_encoding
iconv(): Wrong charset, conversion from `ascii,iso-2022-jp,utf-8,euc-jp,iso-8859-1' to `utf-8//IGNORE' is not allowed

Any idea how to fix this?

leofeyer commented 9 years ago

I have found a proper solution (see #52).

nicolas-grekas commented 9 years ago

@leofeyer note that HHVM doesn't accept an array as last argument.

leofeyer commented 9 years ago

Does it accept a comma separated list?