Stronger validation of incremental text sync consistency between the client and the server

artempyanykh commented 1 year ago

Context

I maintain a markdown language server that is used by 6 different LSP clients. Sometimes I get bug reports like this (Neovim editor) or this (Helix editor) but getting a repro is very hard and debugging is very time consuming (especially, given that I don't personally use all 6 LSP clients on a daily basis).

Even when I can find a repro (like in this case with Emacs lsp-mode) this is just 1 client out of 6.

One thing that can simplify my life as a maintainer of an LSP sever is stronger validation.

The proposal

Update: Replace 'starting version' with 'content hash' below. The spirit of the proposal hasn't changed and is about enabling stronger validation for incremental text sync.

This is relevant for the incremental text sync. The protocol currently has the following in textDocument/didChange:

/**
 * The document that did change. The version number points
 * to the version after all provided content changes have
 * been applied.
 */
textDocument: VersionedTextDocumentIdentifier;

The version number there is "the version after all provided content changes have been applied".

I find this problematic because each LSP client (nvim lsp, emacs lsp-mode & eglot, sublime text, helix, etc.) has its own way of dealing with document versions:

some increment by 1 on each change notifications,
others increment by the number of content changes in the change notification,
other just increment by a seemingly random number that is tied to their internal counters.

This means that if there's a bug in a client (or in the server), the state can subtly drift apart until the server receives a change that violates the state of the document that it has.

This prevents stronger validation on the server side, and makes triaging and debugging incremental text sync issues very painful.

Having both an starting and a final version in the document change notification would allow the server to check its current version with the notification's initial version and if there's a mismatch to abort early. Also, would simplify triaging: is it a client error? is it a race condition or some other bug in the server?

Perhaps I'm missing something in the spec that would allow me to do this type of stronger validation of the state?

rchl commented 1 year ago

A buggy client could still report versions correctly but provide incorrect changes. So it wouldn't help that case.

I guess this would only help with particular cases where the client skips reporting of some change where it shouldn't but I'm not sure if that's worth extending protocol for.

artempyanykh commented 1 year ago

@rchl fair enough! This won't make the protocol bullet proof. But the status quo makes incremental text sync just too much hassle to support or at least have ON by default for a language server maintainer.

Using some form of a check-sum though, would help.

dbaeumer commented 1 year ago

As said a starting version number will not make this better since client's that report wrong versions/content changes will very likely report wrong information even if they have to send a start version number.

So the only thing that improves things is to send a hash, however I don't want to force a client to compute that hash for every change. All I could think of is some debug flag that users can enable to send such a hash.

artempyanykh commented 1 year ago

@dbaeumer thanks for your reply!

however I don't want to force a client to compute that hash for every change.

What's the concern here – performance impact on the client?

I'd imagine if it's a hash per change notification rather than per change and the client does reasonable batching it may not be that bad... but this needs to be measured.

some debug flag that users can enable to send such a hash.

This would be an improvement, although I'm not 100% clear on the logistics of this. Perhaps, this could be something that the server could also request during initialization (the client should still be able to deny it and skip calculating content hashes)? This would give maintainers of some servers more control without forcing clients to deal with content hashes all the time for all servers. This could also act as a forcing function to implement smarter batching of changes which would be beneficial to the perf of both clients and servers.

microsoft / language-server-protocol

Stronger validation of incremental text sync consistency between the client and the server #1706

Context

The proposal