Return correct byte offset from IndexIn

tliron / glsp

Language Server Protocol SDK for Go

Apache License 2.0

149 stars 21 forks source link

Return correct byte offset from IndexIn #5

Closed shezadkhan137 closed 2 years ago

shezadkhan137 commented 2 years ago

According to the LSP Spec Character Offsets are based on UTF-16 string representations. The current IndexIn assumes the Character is a byte offset leading to incorrect state in the Language Server.

tliron commented 2 years ago

Thanks! But this looks like a very complicated solution. Is there really nothing in the utf16 package that can handle this?

shezadkhan137 commented 2 years ago

Yeh, honestly I wondered the same about the utf16 package. I think the answer is no because the Decode/Encode seems to just work on []uint16and return []rune, neither of which tells us the number of code units? I may be wrong though.

The code I actually used is modified from the actual gopls implementation, as I figured that was the most straightforward way of figuring out how to do it. FWIW, it just boils down to double counting the utf8 bytes when the rune is over 0x10000.

tliron commented 2 years ago

Hm, this is interesting problem. I'm absolutely fine with accepting this code as it is, if it does work (and it seems to). Could you just fix the PR to add a comment about where you took the code from? That way in the future maybe someone can take another look and find a more straightforward solution.

shezadkhan137 commented 2 years ago

@tliron I've updated the comments to explain where the code has been taken from.

tliron commented 2 years ago

@shezadkhan137 Thank you very much for your contribution!

mickael-menu commented 2 years ago

Perfect timing! I got a bug report that was caused by this: https://github.com/mickael-menu/zk/issues/72