silverstripe / silverstripe-framework

Silverstripe Framework, the MVC framework that powers Silverstripe CMS
https://www.silverstripe.org
BSD 3-Clause "New" or "Revised" License
721 stars 821 forks source link

[I18N] Transliterator fails to convert not normalised UTF-8 string. #5168

Open gregsmirnov opened 8 years ago

gregsmirnov commented 8 years ago

We encountered a case when A umlaut was encoded as "A\xCC\x88", and SS_Transliterator failed to convert it to AE with iconv disabled.

This is caused by COMBINING DIAERESIS character.

We fixed the problem by normalising UTF-8 string

$source = Normalize::normalize($source, Normalize::NFKC);
$output = $transliterator->toASCII($source);

Normalize class is part of php intl extension, but tchwork/utf8 shim can be used.

tractorcow commented 8 years ago

It's probably risky to require iconv, even for 4.x... I'd be in favour of a shim to be honest, although we'd need to get @silverstripe/core-team to agree before adding any new dependencies to framework.

gregsmirnov commented 8 years ago

We are using shim in our code now.

dhensby commented 8 years ago

iconv is a requirement of SS (https://docs.silverstripe.org/en/3.2/getting_started/server_requirements/)

However, I'm fine with a shim too

tractorcow commented 8 years ago

Oh, if it's already a requirement, then we can simply adjust the server dependencies to "iconv or a substitute, such as tchwork/utf8".

Then we can adjust the behaviour to fail if iconv isn't installed.

However, this raises the question; If you are running a server without the necessary minimum modules installed, then it's not a bug if it behaves incorrectly, so closing. :)

gregsmirnov commented 8 years ago

The topic is about UTF-8 string normalisation, that is a part of php-intl module. SS_Transliterator class has default configuration to ignore iconv and use custom character mapping, that in many cases produce better results.

On the other hand, SS_Transliterator character mappings should be improved for extensions. For example for Greek or Cyrillic alphabets, I am forced to use different implementations.

dhensby commented 8 years ago

If we're ignoring iconv for our own implementation and it needs improving, then we should improve it :)

@gregsmirnov what's the needed action here? Can you open a PR?

gregsmirnov commented 8 years ago

I'll add test cases and prepare PR.

dhensby commented 8 years ago

Thanks so much @gregsmirnov

tractorcow commented 8 years ago

Ok, I see your point. Thanks for the clarification. ;)

maxime-rainville commented 4 years ago

Unfortunately, Silverstripe CMS 3 has entered limited support in June 2018. This means we'll only be fixing critical bugs and security issues for Silverstripe CMS 3 going forward.

You can read the Silverstripe CMS Roadmap for more information on our support commitments.

Can someone confirmed if this is still a problem for SS4?