LS is not aware of and thus is not performing position encoding kind negotiation, which means the language client will assume UTF-16. Actually, LS is using UTF-8 because that's the native encoding of Rust strings. This means bad things can happen if files contain non-ASCII characters, especially ones like Emoji, which span multiple Unicode codepoints.
Things to implement:
Properly negotiate position encoding with the language client. Prefer UTF-8 to avoid re-encoding files, but fall back to UTF-16 as this is the only encoding guaranteed to be universally supported by clients.
When converting Cairo positions to LSP ones, take into account encoding differences. This will require knowing file source at conversion time, which will be a large refactoring.
Enforce UTF-8 encoding in E2E tests (add appropriate asserts in MockClient).
LS is not aware of and thus is not performing position encoding kind negotiation, which means the language client will assume UTF-16. Actually, LS is using UTF-8 because that's the native encoding of Rust strings. This means bad things can happen if files contain non-ASCII characters, especially ones like Emoji, which span multiple Unicode codepoints.
Things to implement:
MockClient
).