microsoft / language-server-protocol

Defines a common protocol for language servers.
https://microsoft.github.io/language-server-protocol/
Creative Commons Attribution 4.0 International
11.31k stars 804 forks source link

Delimitate regions eligible for spell checking. #1017

Open iago-lito opened 4 years ago

iago-lito commented 4 years ago

A discussion started at neovim about the benefits that LSP could bring to spell checking, but I'm not sure whether this has been discussed here yet.

The LSP server is very aware of the file structure and programming language specificities, which makes it a very good candidate for delimiting regions eligible to spellchecking (comments, docstrings, strings maybe, variable names maybe, etc.).

Could this spellcheck-regions-delimiting activity be part of the protocol? What would it imply in terms of coordination with existing spellcheckers/editors?

The most basic support would be to not delimitate anything and return the whole file as one big region, leaving the burden of ignoring keywords, etc. to the spellchecker itself. Such a trivial implementation is dummy, but it would still be useful in 2 cases I think:

matklad commented 4 years ago

There's actually quite a few things besides the regions that the server needs to communication to have spellchecking:

https://github.com/microsoft/vscode/issues/20266#issuecomment-470620828

iago-lito commented 4 years ago

@matklad Of course :) I forgot to acknowledge how ignorant I am regarding the detailed process and every problem met on the way. The intent is rather to check whether this has already been discussed before (so thank you for having brought this other discussion here), and to take the temperature regarding spellchecking at LSP.

mickaelistria commented 4 years ago

I think this is partly related to https://github.com/microsoft/language-server-protocol/issues/18 . Semantic Highlighting actually seems to involve file tokenization and the LS returning file tokens to the client for further styling. The "natural language text" could then be just a token type.

iago-lito commented 4 years ago

@mickaelistria I also think it is. At its core, spellchecking is essentially a linting process, and token highlights is its most natural output. However, I have no idea how standardized existing spellcheckers already are, where they would best fit in the process (e.g. interacting rather with the LSP client or the server?), or how hard it will be to correctly specify that interaction.

For instance, highlighting/linting is a thing, but there is also fixing, fleshing up dictionary, etc. are all these needs already listed somewhere?