Would it be possible to add an example of stripping accents to the documentation? (This is commonly needed for search applications.)
As I understand it, the right way to do this is to determine if each character IS_LETTER and in one of these Unicode blocks: LATIN_1_SUPPLEMENT, LATIN_EXTENDED_ADDITIONAL, LATIN_EXTENDED_A, LATIN_EXTENDED_B. If it is, then decompose it, remove any NON_SPACING_MARKs, and recompose.
I haven't been able to figure out if a character is a non-spacing mark or not.
Would it be possible to add an example of stripping accents to the documentation? (This is commonly needed for search applications.)
As I understand it, the right way to do this is to determine if each character IS_LETTER and in one of these Unicode blocks: LATIN_1_SUPPLEMENT, LATIN_EXTENDED_ADDITIONAL, LATIN_EXTENDED_A, LATIN_EXTENDED_B. If it is, then decompose it, remove any NON_SPACING_MARKs, and recompose.
I haven't been able to figure out if a character is a non-spacing mark or not.