protectwise / troika

A JavaScript framework for interactive 3D and 2D visualizations
MIT License
1.62k stars 118 forks source link

getCaretAtPoint(...) taking surrogate pairs into consideration #304

Open kalegd opened 8 months ago

kalegd commented 8 months ago

Currently getCaretAtPoint(...) does not take surrogate pairs into consideration. So if an emoji is present, the caret index found might be in the middle of it as emojis can span a variable number of code units (Attaching a gif to show an example of what I mean)

What are your thoughts on the function only returning the indices before or after the whole surrogate pair and not the middle ones in the case of emojis/other complex characters?

This is definitely an edge case that could be handled on the client side, but I do think in the long run it would beneficial to have this support out of the box. I personally can't think of any reason one would want the index in the middle of some surrogate pair

lojjic commented 8 months ago

You're definitely right. The behavior you're showing is appropriate(ish) for ligature glyphs, but not for surrogates. I think we can add a check here for whether the indices were skipped due to ligatures vs. surrogates. We may also need to make some other adjustments in selectionUtils to make sure we walk back to the start of a surrogate and never return an inter-character index.

kalegd commented 8 months ago

Great. Assuming no one else gets to it by then, I can put some time aside to work on a PR for it towards the end of the month