sbrl / Pepperminty-Wiki

A wiki in a box
https://peppermint.mooncarrot.space/
Mozilla Public License 2.0
178 stars 20 forks source link

Searching: Transliterate non-ascii characters to remove accents #156

Closed sbrl closed 6 years ago

sbrl commented 6 years ago

We should transliterate source pages before inserting them into the index such that a search for bïll will hit both bill, bïll, and bïĺl.

We've got 2 options here:

We should make sure that we add a setting for this, as non-english pages may experience issues (e.g. russian / cyrillic) if we don't transliterate correctly.

sbrl commented 6 years ago

Looks like this answer has just what we need:

$literator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD)

This not only removes diacritics, but also correctly handles cyrillic (and other alphabets too I assume).

sbrl commented 6 years ago

Looks like this answer has just what we need:

$literator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD)

This not only removes diacritics, but also correctly handles cyrillic (and other alphabets too I assume).

sbrl commented 6 years ago

Done! :D