lexical-lsp / lexical

Lexical is a next-generation elixir language server
779 stars 77 forks source link

Fix conversion of char positions to UTF-16 code units #719

Closed zachallaun closed 2 months ago

zachallaun commented 2 months ago

We were previously using CodeUnit.to_utf16, which converts a code unit in a UTF-8 string to a code unit in the same string encoded as UTF-16, but we were passing in a character position which would lead to incorrect conversion of positions if any multi-byte characters were present before the position.

There were two tests of this existing behavior that seem to have been testing the incorrect behavior.


I'm marking this PR as draft for now because I'm also going to delete some CodeUnit conversion code that we should never be using (like CodeUnit.to_utf16) in a separate commit.

zachallaun commented 2 months ago

@scohen I'm not sure that many people have run into this bug, but I expect that more will because things like workspace/document symbols makes it a lot more likely that the server return a range for a string with a multi-byte character in it (that's how I found it). All of that is to say: is it worth cutting a v0.6.1 after this has been merged and put through its paces for a few days?