whatwg / dom

DOM Standard
https://dom.spec.whatwg.org/
Other
1.55k stars 286 forks source link

Can DOM ranges split grapheme clusters and surrogate pairs? #933

Open xfq opened 3 years ago

xfq commented 3 years ago

https://dom.spec.whatwg.org/#ranges

For Text nodes, it seems that the offset of a boundary point is code unit (rather than grapheme cluster) based, and surrogate pairs might be split.

It would be useful to add a note to remind web developers and specs writers (like css-highlight-api, for example) that grapheme clusters and surrogate pairs might be split, preferably with an example. The note should contain a strong warning against splitting and surrogate pairs.

If possible, DOM should normatively prevent the splitting of surrogate pairs or make it non-conformant.

(This comment is part of a review on behalf of the W3C i18n WG.)

annevk commented 3 years ago

I'm somewhat supportive, but I also feel like this is something that should be pointed out in ECMAScript, if at all. I guess in theory one could expect the DOM to have created some higher level of abstraction, but I'm not sure why one would think that.

Speaking of surrogates, Safari seems to handle document.body.append("\uD800", "\uDC00", "\uD800\uDC00"); somewhat poorly.