Open indradhanush opened 1 year ago
My hunch is that the line length is calculated with a method that treats an emoji as a single character, but then when the highlight is applied the emojis count as multiple characters (as they are multiple u8
s in size). This leads to the emojis "swallowing" some of the highlighting.
Just a hunch
Related PR that was closed: https://github.com/sourcegraph/sourcegraph/pull/49753
Here's a link to dotcom that easily reproduces the issue. It's much easier to see when using the dark theme: https://sourcegraph.com/github.com/sourcegraph/sourcegraph@67b2c665f1746793d658f43bbfb1b03fca41cbaa/-/blob/internal/scim/init.go?L72:6-72:7
Rough implementation sketch to avoid regressing perf:
HashSet
.to_wide
method) if the line belongs to HashSet.We should make sure to set the position_encoding
field on the Document.
The bulk of the changes should not be tied to the core highlighting logic.
Just for clarification: You're proposing to reencode all document offsets into UTF-16 code units, because these are easier to consume from JS, right? I'm asking because LineIndex from rust analyzer is generic in its encoding and to_wide
can also emit code points (utf-32). I think we could make an argument for exposing both utf-16 or utf-32 code units via the API.
That won't be free, so regressing perf is unavoidable, but I agree with your implementation sketch in terms of minimizing the perf impact.
You're proposing is to reencode all document offsets into UTF-16 code units, because these are easier to consume from JS, right?
Yes. Albeit, in the common case, it will be mostly "free" (only the is_ascii
checks).
I think we could make an argument for exposing both utf-16 or utf-32 code units via the API.
-1, we shouldn't expose stuff for which we don't have downstream users. We already have too much of that in other places.
Emoji in one line Last character of the line with emoji in the comment renders in white.
No emoji
Renders correctly.
Emoji in all lines All the lines with at least one character after
//
have the last character in white.Two emojis make two characters go white in the end
Thanks to @pjlast!
One more to verify @pjlast's hunch Two emojis at the start make three characters in the end go white. This is because the first emoji has a skin tone and takes double the bytes of a regular emoji.