mathiasbynens / esrever

A Unicode-aware string reverser written in JavaScript.
https://git.io/esrever
MIT License
890 stars 31 forks source link

Incorrect reversal of the U+0489 character #14

Open lunakurame opened 6 years ago

lunakurame commented 6 years ago

I've got a string with this character: ҉ U+0489 COMBINING CYRILLIC MILLIONS SIGN

Before reversing: te҉st te\u0489st After reversing: ts҉et ts\u0489et

I might be wrong, but I expected tse҉t tse\u0489t instead. Is there a reason why it behaves like this or is it just a bug? I found it when my unit tests failed while checking my code using random zalgo examples.

mathiasbynens commented 6 years ago

This is an example of a symbol with the Grapheme_Extend property. IIUC they can only really follow Grapheme_Base symbols, but for esrever’s purposes we could probably just treat them like regular combining marks.