rivo / uniseg

Unicode Text Segmentation, Word Wrapping, and String Width Calculation in Go
MIT License
570 stars 60 forks source link

Fix breaking of the next emoji after the control character #16

Closed fmatzy closed 2 years ago

fmatzy commented 2 years ago

When a control character is followed by a emoji, such as \tšŸ³ļøā€šŸŒˆ, the state seems to be broken at the transition to the next, causing the emoji grapheme to split.

https://play.golang.org/p/jUX-VcrwFnm

This is due to the loss of state of the emoji when applying the rule on transition from the control character to the next character.

In this PR, only the boundary condition uses the lower rule number, and the state is not overwritten.

rivo commented 2 years ago

Good catch! Thank you for providing a solution, too. Due to #11, the test case code has changed somewhat. Do you want to update your PR to resolve the merge conflicts? Or do you want me to make the changes? (Both fine with me.)

fmatzy commented 2 years ago

@rivo OK, I've updated my PR to resolve the merge conflicts.

rivo commented 2 years ago

Thanks!