Closed sbrl closed 6 years ago
Looks like this answer has just what we need:
$literator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD)
This not only removes diacritics, but also correctly handles cyrillic (and other alphabets too I assume).
Looks like this answer has just what we need:
$literator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD)
This not only removes diacritics, but also correctly handles cyrillic (and other alphabets too I assume).
Done! :D
We should transliterate source pages before inserting them into the index such that a search for
bïll
will hit bothbill
,bïll
, andbïĺl
.We've got 2 options here:
iconv
: https://stackoverflow.com/a/3542748/1460422 - We don't currently use this module, so it would be another dependency.intl
viatransliterator_transliterate()
: https://stackoverflow.com/a/16022459/1460422We should make sure that we add a setting for this, as non-english pages may experience issues (e.g. russian / cyrillic) if we don't transliterate correctly.