As an optimization, I feel that pygls should choose the utf-32 encoding if the editor prefers it over utf-16.
Looking a bit at the code, it looks like _with_position_encodings:
def _with_position_encodings(self):
self.server_cap.position_encoding = types.PositionEncodingKind.Utf16
general = self.client_capabilities.general
if general is None:
return self
encodings = general.position_encodings
if encodings is None:
return self
if types.PositionEncodingKind.Utf16 in encodings:
return self
if types.PositionEncodingKind.Utf32 in encodings:
self.server_cap.position_encoding = types.PositionEncodingKind.Utf32
return self
if types.PositionEncodingKind.Utf8 in encodings:
self.server_cap.position_encoding = types.PositionEncodingKind.Utf8
return self
logger.warning(f"Unknown `PositionEncoding`s: {encodings}")
return self
The code here looks like it does encoding negotiation. However, in practice unless the editor explicitly attempts to hide that it supports UTF-16 (which it is required to support), then the outcome will always be UTF-16. Even both parties should have agreed on a better alternative for them. Notably, UTF-32 is advantageous for pygls, since it makes all the position code related operations trivial operations.
As an example, the LSP client eglot (from emacs) has the following encoding order: position_encodings=['utf-32', 'utf-8', 'utf-16']). Yet, the resulting encoding chosen by pygls ends up being utf-16.
As an optimization, I feel that
pygls
should choose theutf-32
encoding if the editor prefers it overutf-16
.Looking a bit at the code, it looks like
_with_position_encodings
:The code here looks like it does encoding negotiation. However, in practice unless the editor explicitly attempts to hide that it supports UTF-16 (which it is required to support), then the outcome will always be UTF-16. Even both parties should have agreed on a better alternative for them. Notably, UTF-32 is advantageous for pygls, since it makes all the position code related operations trivial operations.
As an example, the LSP client
eglot
(fromemacs
) has the following encoding order:position_encodings=['utf-32', 'utf-8', 'utf-16'])
. Yet, the resulting encoding chosen bypygls
ends up beingutf-16
.