helix-editor / helix

A post-modern modal text editor.
https://helix-editor.com
Mozilla Public License 2.0
33.65k stars 2.5k forks source link

Spellchecking integration #11660

Open the-mikedavis opened 2 months ago

the-mikedavis commented 2 months ago

I've switched https://github.com/helix-editor/spellbook (Hunspell-like spellchecking library) to public and published a v0.1.0 release. It's still a work-in-progress but it's matured to the point where the check interface works robustly for at least en_US. Other languages should work too but there may be bugs with more advanced rules dictionaries can use. suggest functionality is what I'm working on now but now that check is in place it's a good time to start thinking about the integration with Helix which is also a potentially large amount of work.

I'm using it in my driver branch in (commit) where the integration is very naive: it checks all words in the viewport render and highlights misspelled words.

@pascalkuthe and I have chatted about this and here's what we're thinking for a proper integration:

Interacting with the spellchecker should be similar to interacting with LSP code-actions. <space>a should pull up LSP code actions and actions from the spellchecker: adding a misspelled word to a personal dictionary and suggestions from suggest. Adding to a personal dictionary can use the spellbook::Dictionary::add function and append a line with the word to a file in $XDG_STATE_HOME/helix/personal-dictionary/<locale>.

sbromberger commented 2 months ago

Not that I'm against this idea at all - I'm really happy that we're considering adding spell checking as a first class feature - but what would be the advantage of using an integrated spell checker over an LSP-based solution? I've been using vale now for a while and (aside from the fact that it doesn't allow you to add to local dictionaries) it's pretty darn good. I am similarily enthusiastic about Harper, but it appears to have some different limitations.

Will there be plans for grammar checking as well?

kirawi commented 2 months ago

You would be able to leverage tree-sitter for spell checking in programming languages.

sbromberger commented 2 months ago

You would be able to leverage tree-sitter for grammar checking in programming languages.

What about natural languages (markdown / text / LaTeX?)

kirawi commented 2 months ago

It should work as normal with LSPs now.

sbromberger commented 2 months ago

Then I guess I'm not seeing as much of an advantage if I have to use an LSP in any case to get grammar checking. Most grammar checkers (vale, harper) do spell checking as well....

kirawi commented 2 months ago

Yeah, it's most beneficial for programming languages.

summersz commented 1 month ago

Looking forward to this. I have used CSpell in the past so having a different strategy/config for code and prose sounds like a good idea and is a limitation of the lsp alternatives.

pascalkuthe commented 1 month ago

Yeah helix having a universally parser/sytanx tree woth tree sitter that we can use for spellchecker is something an LSP can't really match.

Spellchecking is also a fairly basic/common feature that is nice to have on core withlut having to use/setup a third party LSP.

justinlovinger commented 1 month ago

Vale and Harper can already spellcheck comments in code. typos can additionally spellcheck variable-names.

justinlovinger commented 1 month ago

I have concerns this feature will not age well and will eventually become a "thing new users should turn off and replace". Linting natural language is incredibly complicated, and even the best solutions today are far from ideal. I think spelling and grammar checking should be left to integration with external tools, not a part of Helix itself.

P.S. When I first switched from Vim to Helix, I wanted integrated spellchecking. Now that I have experience with the superior quality of language-server-based spellchecking, I never want to go back to a simple dictionary-check.

the-mikedavis commented 1 month ago

Nothing's stopping you from continuing to use something else via LSP if you choose. You shouldn't need a language server installed for basic spellchecking (word checking, not grammar) though.

In any case this issue is about the technical aspects of integration. Discussion on the concept of integrating spellchecking is in https://github.com/helix-editor/helix/discussions/3637

jsco2t commented 1 month ago

I'm a relatively new user to Helix - so take that as the important context in this statement. I spent a non-trivial bit of time setting up LSP's (in my case Vale) just so that I could get spell-checking in Markdown docs.

I get the concerns about the feature potentially not aging well. I think the simple solution to that (similar to what vim/nvim does) is to allow the built-in spell-checker to be turned off if needed.

For me - having a built-in spellchecker would have been a productivity/usability boost as a new user (a user who also sucks at spelling occasionally)

(also just realized this comment was probably off-topic per @the-mikedavis above - my apologies)

pascalkuthe commented 1 month ago

I am unconvinced by typos(-ls) approach. They aim for low false positives rate and only check for known typos instead of checking words against dictonaries which is why they can check programming languages without parsing the sourccode. However I find the false negative rate way too high and find the results with traditional spell checkers much better.

With helix we are in a position where we do have a universal parser for every language and that means we can enable a much more reliable spellchecker. This is something that is impractical (and inefficient) for a LS to replicate.

justinlovinger commented 1 month ago

@jsco2t I responded to you on the discussion thread: https://github.com/helix-editor/helix/discussions/3637#discussioncomment-10602371.

@pascalkuthe I responded to you on the discussion thread: https://github.com/helix-editor/helix/discussions/3637#discussioncomment-10602373.

hnorkowski commented 1 month ago

I'm using it in my driver branch in (commit) where the integration is very naive: it checks all words in the viewport render and highlights misspelled words.

I wanted to try it but there is a private dependency in it:

skidder = { git = "https://github.com/helix-editor/tree-sitter-helix.git" }
the-mikedavis commented 1 month ago

Yeah that commit is on my driver branch which has a bunch of other changes. I pushed up another branch off of master: commit. You'll need a dictionary downloaded into $HELIX_RUNTIME/dictionaries/en_US/en_US.{dic,aff} to run that commit. You can download one from http://wordlist.aspell.net/dicts/ or copy the right files out of https://github.com/LibreOffice/dictionaries or https://searchfox.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/utf8

hnorkowski commented 1 month ago

@the-mikedavis Is it possible to underline the spelling issues in a different color than the LSP issues?

the-mikedavis commented 1 month ago

I was imagining that there would be diagnostics (same as LSP diagnostics) for misspelled words. They could be at a different or maybe customizable severity which would affect how they're displayed. In the linked commit/branch I'm using diagnostic.error only for simplicity