Closed leofeyer closed 9 years ago
can you please provide a patch?
I have already tried to fix it but it does not seem to be trivial. Here are two examples where the conversion is no problem:
// ISO-8859-1 to UTF-8
mb_convert_encoding(utf8_decode('déjà'), 'UTF-8', 'ISO-8859-1');
// ISO-2022-JP to UTF-8
mb_convert_encoding(mb_convert_encoding('漢字', 'ISO-2022-JP', 'UTF-8'), 'UTF-8', 'ISO-2022-JP');
We are using mb_convert_encoding()
to convert file and folders names, which can be encoded in various ways all over the world. Therefore, we do not know the exact charset.
The PHP mbstring
extension supports the following in this case:
// <unknown> to UTF-8
mb_convert_encoding($filename, 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');
// the third argument can also be an array
mb_convert_encoding($filename, 'UTF-8', array('ASCII', 'ISO-2022-JP', 'UTF-8', 'EUC-JP', 'ISO-8859-1'));
This still works fine with our two test cases from above:
// ISO-8859-1 to UTF-8
mb_convert_encoding(utf8_decode('déjà'), 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');
// ISO-2022-JP to UTF-8
mb_convert_encoding(mb_convert_encoding('漢字', 'ISO-2022-JP', 'UTF-8'), 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');
But it does not work with the compatibility layer:
1) Patchwork\Tests\PHP\Shim\MbstringTest::testmb_convert_encoding
iconv(): Wrong charset, conversion from `ascii,iso-2022-jp,utf-8,euc-jp,iso-8859-1' to `utf-8//IGNORE' is not allowed
Any idea how to fix this?
I have found a proper solution (see #52).
@leofeyer note that HHVM doesn't accept an array as last argument.
Does it accept a comma separated list?
According to the PHP manual, the third argument of
mb_convert_encoding()
can be either a comma separated list or an array. The Mbstring class only handles the first case.