neovim / neovim

Vim-fork focused on extensibility and usability
https://neovim.io
Other
80.17k stars 5.51k forks source link

spell checker integration #12064

Open okias opened 4 years ago

okias commented 4 years ago

As mentioned in https://github.com/neovim/neovim/issues/356 would be great to have external spellchecking.

I noticed there was some work on hunspell https://github.com/vim/vim/pull/2500 , but nothing finished.

Is any chance to integrate enchant or nuspell https://nuspell.github.io/ (or at least older hunspell) as an option to Neovim?

Please keep this at least as placeholder to track progress. @mcepl

xylix commented 4 years ago

Wouldn't this be something that would be best to implement as a language server plugin for natural languages?

mcepl commented 4 years ago

Wouldn't this be something that would be best to implement as a language server plugin for natural languages?

I don’t think so, because then you get quite often into horrible business of two language servers on one document. Also, I wonder whether it wouldn't be too slow.

txk2048 commented 4 years ago

There's already a integrated spell checker in upstream vim, the spelling dictionary files are located under runtime/spell. To enable it you just run :set spell spellang=<LANG>, replacing \<LANG> with your language code eg. en_US for American English

But I noticed that these language files have been removed in Neovim but the spelling checking engine component has been left in

mcepl commented 4 years ago

There's already a integrated spell checker in upstream vim, the spelling dictionary files are located under runtime/spell. To enable it you just run :set spell spellang=<LANG>, replacing with your language code eg. en_US for American English

Sorry, you completely missed the point of this ticket. Of course, we all know about the Bram’s spellchecker, I use it (in neovim, of course, it works perfectly fine) all the time (including now), but I, the reporter of this issue, and many many others consider it stupid that contrary to the UNIX philosophy (do one thing and do it well) and contrary to every other program in Linux, vim does spellchecking on its own and doesn’t use the system spellchecker (on Linux it is hunspell, most of the time). Internal vim spellchecker is old, basically unmaintained (because almost unmaintainable) code in the biggest C file in the project, which should go and be replaced by the binding to the external spellchecker. One attempt to do so was ignored by Bram in https://github.com/vim/vim/pull/2500, other option would be to make spellchecker as a remote plugin, but currently everybody who took a look at spell.c run away crying and nothing has changed.

chrisbra commented 4 years ago

It wasn't actually ignored. It just wasn't ready to be merged and you mentioned you had no interest in making it fit.

mcepl commented 4 years ago

It wasn't actually ignored. It just wasn't ready to be merged and you mentioned you had no interest in making it fit.

Yes, it was ignored. I was asking for help, and nobody seemed to care, so I gave up. I was trying most of what I can do (I am not a C programmer myself), and only reply was Bram’s smug persuasion how vim internal spellchecker is the best thing since the sliced bread.

chrisbra commented 4 years ago

One attempt to do so was ignored by Bram in vim/vim#2500,

For the record, You explicitly said that. But Bram did not ignore it, he commented on that PR. Yes sometimes it takes until a patch is included. But that doesn't mean we are ignoring others work.

It's open source. If you want to have it done, do it yourself (or pay someone to do it).

Bram’s smug persuasion how vim internal spellchecker is the best thing since the sliced bread.

which was true, if you compared the state of spell checkers back when it was written and included in Vim with Version 7.

BTW: I am not a C programmer myself and had to learn it myself as well. Such is life.

mcepl commented 4 years ago

It's open source. If you want to have it done, do it yourself (or pay someone to do it).

Yes, I know it, and I just gave up, so I haven’t complained about it until provoked. It doesn’t change a bit on the fact, that my effort has been ignored.

And that pull request was created from older Red Hat patch on 26 Dec 2017. Yes, vim’s internal spellchecker might be a good idea when it was introduced (the earliest tag I found on the GitHub repo was v7.0131 from 2005-08-07), but circumstances has changed since then. Everybody and their dog have some kind of packaging system, so it is not that big deal to depend on the external library/program.

That pull request (if finished, it never really completely worked) would be better than what we have now, but the ideal situation would be IMHO some kind of the remote plugin API, so we can have plugins for even unforeseen spellchecker, because the development in that area seems to be still in flux (I have just learned about existence of https://github.com/nuspell/nuspell and there are certainly others).

ghost commented 4 years ago

@mcepl they didn't ignore your effort. They answered clearly and repeatedly. They simply have other more urgent priorities than changing a working spellcheck engine. If your PR didn't even work, it will take time to fix; probably they'll have to redo it from scratch, who knows? Replacing the internal sp with hunspell won't really achieve anything, as hunspell sucks for many languages. Replacing it with enchant was attempted 2 years ago following your pull request, and given up because it requires redoing vim's api, that is, a ton of work and time. One thing is a working PR, another something that is only a rough first draft meant to induce other people to do the work for you ("I am not a C programmer, so the main purpose of this PR is to attract other people to finish this"). You can't really expect people to drop everything and run to fix your PR.

tmccombs commented 4 years ago

Fighting over the status of that closed PR doesn't really accomplish anything. Especially since it looks like that PR was specifically adding support for hunspell, and I agree with @mcepl that the correct way to do this is either to support a something like a remote-plugin API for spelling, or use something like enchant that supports multiple spelling engines (including hunspell and nuspell) as backends.

Replacing it with enchant was attempted 2 years ago following your pull request, and given up because it requires redoing vim's api, that is, a ton of work and time.

How far did this get? Is there enough there to salvage and create a new (working) PR adding enchant support?

One question I have is, what should happen to the internal spell checker if support is added for an external/system spell checker? Should it remain for complete backwards compatibility, and in case a system spell checker isn't available? Or should it be removed completely in favor of the new system?

ghost commented 4 years ago

How far did this get? Is there enough there to salvage and create a new (working) PR adding enchant support?

idk, this is the source: https://github.com/vim/vim/pull/2500#pullrequestreview-86001998 It seems the branch was never made available.

what should happen to the internal spell checker

Ideally it would be removed to further lighten the code base, but I guess some long time vimers will be opposed (though, those are probably still using the original vim).

mcepl commented 4 years ago

Especially since it looks like that PR was specifically adding support for hunspell, and I agree with @mcepl that the correct way to do this is either to support a something like a remote-plugin API for spelling, or use something like enchant that supports multiple spelling engines (including hunspell and nuspell) as backends.

Well, better still supported and maintained hunspell than internal spellchecker. Problem with enchant is that it brings glib as additional external dependency and that is probably unacceptable to both Bram and neovim maintainers.

One question I have is, what should happen to the internal spell checker if support is added for an external/system spell checker? Should it remain for complete backwards compatibility, and in case a system spell checker isn't available? Or should it be removed completely in favor of the new system?

It should certainly be removed. If somebody want to rewrite the internal vim spellchecker into an independent library/independent program which could be bundled with vim as its default spellchecker, no problem with that.

mcepl commented 4 years ago

idk, this is the source: vim/vim#2500 (review)

That's my hunspell PR, he was asking about the enchant one.

tmccombs commented 4 years ago

Problem with enchant is that it brings glib as additional external dependency and that is probably unacceptable to both Bram and neovim maintainers.

good point.

what if instead of designing a new neovim-specific msgpack protocol, we do something similar to the clipboard provider, where we just run an external program that uses the ispell pipe format (which enchant, hunspell, aspell, etc. all support).

mcepl commented 4 years ago

what if instead of designing a new neovim-specific msgpack protocol, we do something similar to the clipboard provider, where we just run an external program that uses the ispell pipe format (which enchant, hunspell, aspell, etc. all support).

I am a big fan of the original vi logic of filtering through pipe, and I think that’s the best way how to do it. The only problem is that Windows (and less importantly, Solaris) users will hate us, because all creating new processes is horribly slow. Can ispell -a (or its aspell, hunspell, etc. equivalents) be run as a daemon in background?

tmccombs commented 4 years ago

Can ispell -a (or its aspell, hunspell, etc. equivalents) be run as a daemon in background?

Depends on what you mean by that. It would be straightforward to start the process once for a single vim process, and communicate with it using stdin/stdout over the lifetime of the process. However, I'm not sure how difficult it would be to have a shared daemon across multiple vim processes.

mcepl commented 4 years ago

Depends on what you mean by that. It would be straightforward to start the process once for a single vim process, and communicate with it using stdin/stdout over the lifetime of the process. However, I'm not sure how difficult it would be to have a shared daemon across multiple vim processes.

The former, just to avoid creating too many processes on Windows/Solaris.

mcepl commented 4 years ago

Let me put here some previous pieces of art on the theme of using external spellcheckers:

It would be probably useful to go through those plugins (and any other found elsewhere), and find out which of them work.

Other important work would be to analyze spell.c and find out which symbols it exports to the rest of the editor, and which one of them would have to be replaced.

iago-lito commented 4 years ago

Wouldn't this be something that would be best to implement as a language server plugin for natural languages? @xylix

I find that this suggestion was insightful yet rapidly dismissed, maybe it deserves more attention?

If the file is just a plain .txt draft, then it should not be a problem running 1 dedicated LSP for spellchecking.

If the file is a .tex, a .rs or a .py, then care should be taken not to mix up the language commands and the actual words, which is a complicated task that only the currently running tex-, rs- or py-LSP can legitimately tackle, and it is up to this.

In other words, I would find it DRY and meaningful that the running LSP be responsible for spell checking, taking into account all specificities of the various programming languages, and responsible for featuring any spellchecker the end user eventually likes. Would it be useful to drop a line at LSP about it?

mcepl commented 4 years ago

In other words, I would find it DRY and meaningful that the running LSP be responsible for spell checking, taking into account all specificities of the various programming languages, and responsible for featuring any spellchecker the end user eventually likes. Would it be useful to drop a line at LSP about it?

https://www.joelonsoftware.com/2001/04/21/dont-let-architecture-astronauts-scare-you/

Could we start with fixing those vimscript plugins first? http://wiki.c2.com/?DoTheSimplestThingThatCouldPossiblyWork

tmccombs commented 4 years ago

I would find it DRY and meaningful that the running LSP be responsible for spell checking

I'm not sure how that is DRY, it seems WET to me, because now every language server will be responsible for knowing how to do spell check, and different language servers may do so in incompatible ways (like use different personal wordlists). Now, I think it does make sense for an LSP server to indicate which regions of the file are eligible for spell checking (ex comments and strings), possibly tied in with semantic syntax highlighting. And possibly have a mechanism for the LSP server to give the editor a list of words that should be considered correctly spelled for that file. But I don't think the spell checking and recommendations themselves should be the responsibility of the LSP. Not to mention, what if you are editing a file that doesn't have an LSP server, or the LSP server doesn't support spellcheck, and you still want to run spell check?

tmccombs commented 4 years ago

btw, vim syntax files have a way to specify which parts of a file should be spell checked.

iago-lito commented 4 years ago

it seems WET to me, because now every language server will be responsible for knowing how to do spell check, and different language servers may do so in incompatible ways (like use different personal wordlists). Now, I think it does make sense for an LSP server to indicate which regions of the file are eligible for spell checking (ex comments and strings)

@tmccombs Yeah, sorry, that's what I meant (although I wasn't clear maybe): the only spellcheck-related role of the LSP is to isolate these regions, then forward to the actual spellchecker. This way no spellchecker has to be written twice.

what if you are editing a file that doesn't have an LSP server?

Well, there must be at least 1 lightweight, spellcheck-only LSP, whose only job is to forward the file to the spellcheker and return the linting information. The linting logic is already shared with other LSPs, and the actual analysis of the file is very basic (it's just 1 big region to spellcheck).

or the LSP server doesn't support spellcheck, and you still want to run spell check?

My point is that such a server would not be considered LSP-compliant if spellchecking were actually part of the protocol. This is why I think about dropping them a line. If spellchecking is added to LSP, then all LSP servers will have to integrate their share of language-specific spellchecking logic. I'll open a thread there, just to take the temperature on that.
In the meantime, yeah, there will exist incomplete LSP-servers with no spellchecking support yet. In this transient situation, maybe it's good enough to run the lightweight, spellcheck-only LSP server in parallel? I have to admit that I don't know how difficult it is to accept two parallel sources of linting.

btw, vim syntax files have a way to specify which parts of a file should be spell checked.

I personally find this WET, since the vim syntax file and the LSP server internal parsing logic are duplicates of each other.

tmccombs commented 4 years ago

then forward to the actual spellchecker

my concern is, how does that happen? if it just returns the regions to the editor, and then the editor does spell check using the "system" spell checker, all well and good. But if the LSP server is responsible for return spell check results to the editor, that means the LSP server has to deal with issues like determining which spellchecker to use, cross-platform compatibility etc. and that work will have to be done for each LSP server.

My point is that such a server would not be considered LSP-compliant if spellchecking were actually part of the protocol.

That would break backwards-compatibility. And it doesn't solve the problem that not all languages have an LSP server (or the user may not want to use the LSP server).

I personally find this WET, since the vim syntax file and the LSP server internal parsing logic are duplicates of each other.

My point here, is that the regions to do spell check on are tightly tied to the syntax. So if/when LSP allows sending syntax highlighting information, ideally, it could send info about spelling regions as well. Then spellcheck could work basically the same way with an LSP server as without one.

iago-lito commented 4 years ago

Well, thank you for discussing this :)


then forward to the actual spellchecker

my concern is, how does that happen?

I'd say it ideally happens the way you eventually describe:

So if/when LSP allows sending syntax highlighting information, ideally, it could send info about spelling regions as well. Then spellcheck could work basically the same way with an LSP server as without one.


About compatibility..

That would break backwards-compatibility.

I'm sorry I don't understand why, or which compatibility.


About which spellchecker to use..

the LSP server has to deal with issues like determining which spellchecker to use, cross-platform compatibility etc.

Maybe this is better tackled by the client. LSP clients have config files dedicated to explicit these choices with entries like spellchecker-to-use: "path/to/my/favourite/spellchecker", and clients know platform specificities.


This said, I understand that there is still something blurry in the process:

I don't know whether spellcheckers output are already standardized in any way. If they are, then the latter interfacing is a one-shot standardizing effort from LSP, then implementation repercuted on each LSP server. If they are not, then it also imply that one LSP-specific interface needs be featured by each spellchecker, in addition to how they already work. This sounds like a long-run effort indeed.

Anyway, as an appointed "astronaut", I agree that there is nothing neovim-specific left in this piece of the discussion, so maybe we should better move it there.

hauleth commented 3 years ago

Just to revive this a little, there is enchant library form AbiWord that supports many backends. Maybe it would be worth reviewing as a spell checking backend library.

mcepl commented 3 years ago

I think the decision which backend to use (enchant, hunspell directly, nuspell, or creating API for remote plugins) is less important, than actually hewing API from the swamp of spell.c and separating all functionality required from such backend.

jamessan commented 3 years ago

Just to revive this a little, there is enchant library form AbiWord that supports many backends.

FWIW, I did try to integrate enchant into Vim awhile back. While I got it to be superficially functional, I came to realize that much of the remaining work needed to take advantage of Vim's internal heuristics for improving suggestions relies on Hunspell like APIs.

If/when I get the time to finish this work, it would be using Hunspell (or a library with similar APIs), even though I too think that Enchant provides better end-user flexibility.

mcepl commented 3 years ago

FWIW, I did try to integrate enchant into Vim awhile back. While I got it to be superficially functional, I came to realize that much of the remaining work needed to take advantage of Vim's internal heuristics for improving suggestions relies on Hunspell like APIs.

And you still get hit with the glib dependency, which is probably unacceptable. It would have to be a truly remote plugin.

0rtz commented 2 years ago

There's already a integrated spell checker in upstream vim, the spelling dictionary files are located under runtime/spell. To enable it you just run :set spell spellang=<LANG>, replacing with your language code eg. en_US for American English

But I noticed that these language files have been removed in Neovim but the spelling checking engine component has been left in

So right now in neovim it is impossible to set up spell checking for any language other than english?

clason commented 2 years ago

No, of course not -- the dictionaries are just not shipped by default. See :h spell-load.

0rtz commented 2 years ago

@clason well for me it prints that spellfile#LoadFile(): Nread command is not available. relating to spellfile.vim that is specifed in :h spell-load. can it be the cause why spell files not loading?

yar-fed commented 2 years ago

It should prompt you to download them on enabling spellchecking. When I move my config to a new computer and try to activate a spellchecking with my keymaps it prompts (Y/n)

0rtz commented 2 years ago

@yar-fed I do

nvim -u NONE -V2 Makefile
:set spelllang=en,ru
:set spell
not found in runtime path: "spell/en.utf-8.add.spl"
not found in runtime path: "spell/ru.utf-8.spl"
not found in runtime path: "spell/ru.ascii.spl"
Warning: Cannot find word list "ru.utf-8.spl" or "ru.ascii.spl"
Press ENTER or type command to continue

No prompts or any downloads. nvim version is v0.6.0-dev+409-gf6a9f0bfc Should i fill a bug report?

clason commented 2 years ago

-u NONE disables all builtin plugins, including the one for downloading spell files. And I don't know what -V2 is supposed to do.

But, yes, this is unrelated to this issue, so it would be better to ask on Matrix or Discourse. (I don't think there's a bug here, so I wouldn't jump straight to opening an issue.)

vigoux commented 1 year ago

Hi, I am reviving this issue after some comments on vim/vim about hunspell integration (especially for french dictionnaries, which are horrible).

The main arguments in vim/vim (which are all valid in vim/vim's context) are:

  1. VimSpell, at the time, was faster than any other spellchecking library
  2. VimSpell is licenced under the same terms as vim, which allows distributions along with it without hastle
  3. VimSpell requires no extra dependency
  4. VimSpell can detect rare words (SpellRare), and words from other variants of a language (SpellLocal)

Now to answer to those points one by one:

  1. It looks like, after some experiments that follow these guidelines VimSpell is actually slower than hunspell at both spellchecking (~0.02 secs for hunspell, ~0.03 secs for VimSpell, may be a fluke) and suggestions (many times, VimSpell took ~25 minutes, hunspell did the whole list in 2 minutes)
  2. In neovim, this licencing problem seems to have disappear: we already rely on libraries that use different licences than us.
  3. See answer to 2.: we already rely on external libraries
  4. This one is trickier, since version 1.3.0, hunspell can report rare words, so SpellRare is supported. I did not look into haw SpellLocal is implemented, but if it is an extension to the dictionnary format, then it is quite probably not supported. In the contrary, if this is part of the MySpell specification then hunspell probably silently supports it, as it supports MySpell dictionnaries.

On a side note, using hunspell should allow us to stop us from using the weird dictionnary management setup that vim/vim uses (download dictionnary files and such) and only use the system dictionnaries, which seems a lot better (to me at least).

lewis6991 commented 1 year ago

On a side note, using hunspell should allow us to stop us from using the weird dictionnary management setup that vim/vim uses (download dictionnary files and such) and only use the system dictionnaries, which seems a lot better (to me at least).

This would be very good!

justinmk commented 1 year ago

Status

spellchecking while redrawing is a terrible idea. I mean, really terrible. Awful. Seriously.

Reworking the internals to avoid that may be costly, but in general it seems like a strange architecture: why do spell higlights need to be synchronous? A cheap alternative could be to defer to LSP...

lewis6991 commented 1 year ago

I did take a quick look at Aspell and it does seem to provide better suggestions for English, including the ones mentioned here.

We should also look at enchant which is just a wrapper around many backends (including aspell and hunspell).

mcepl commented 1 year ago

We should also look at enchant which is just a wrapper around many backends (including aspell and hunspell).

enchant requires dependency on glib.

tmccombs commented 1 year ago

A couple potential issues with aspell:

  1. It is no longer maintained
  2. If I remember correctly from a project I did several years ago where I needed a spellchecker, the hunspell dictionaries are more complete, even for English. So, while aspell might give better suggestions, it will also mark more correctly spelled words as misspelled .

There is also nuspell, but it is newer, and I'm not sure how good the suggestions are, although it does support hunspell dictionaries.

I think the ideal would be a way to make the spell checker pluggable.

justinmk commented 1 year ago

aspell: It is no longer maintained

based on what? Last release on http://aspell.net/ is 2019, and https://github.com/GNUAspell/aspell seems to be alive.

the hunspell dictionaries are more complete, even for English.

That is proving to be a minor concern. It is a one-time cost to convert dictionaries from one format to another (plus it's easy to hit zg). Whereas the suggestion algorithm is the main feature that actually needs to work well.

I think the ideal would be a way to make the spell checker pluggable.

Doesn't really solve anything if all the backends suck.

tmccombs commented 1 year ago

aspell: It is no longer maintained

based on what? Last release on http://aspell.net/ is 2019 Multiple places discussing why hunspell is preferred over aspell. Although, it looks like maybe those sources are outdated, and there has been at least one release since then. Although there weren't any releases between 2011 and 2019.

and https://github.com/GNUAspell/aspell seems to be alive.

Does it? The only change in the last two years was to the manual. I'll concede it might not be completely unmaintained, but it definitely has significantly less activity than hunspell, and relatedly, is used by fewer downstream projects.

It is a one-time cost to convert dictionaries from one format to another

Perhaps. Although, as far as I know there aren't existing tools to do that conversion, and I'm not even sure it is possible without data loss.

(plus it's easy to hit zg).

As long as you are confident you spelled it correctly 😉.

I think the ideal would be a way to make the spell checker pluggable.

Doesn't really solve anything if all the backends suck.

On the contrary, if they all suck, but in different ways, you can pick the one that works best for you. If you can switch engines at runtime you could even do something like use hunspell by default, but if it doesn't give good suggestions, switch to aspell to see if that gives you better suggestions.

justinmk commented 1 year ago

but it definitely has significantly less activity than hunspell

Then why is the unspell suggest algorithm so bad? That is the main job of a spellchecker.

On the contrary, if they all suck, but in different ways, you can pick the one that works best for you.

That is a cop-out that doesn't change the game for 98% of Nvim users. We need something that we can ship. Also what does a pluggable interface gain over LSP?

tmccombs commented 1 year ago

Then why is the unspell suggest algorithm so bad?

IDK. From what I understand, hunspell is supposed to work better for several non-english languages. Perhaps that comes at the cost of an engine that doesn't work quite as well for English (because it wasn't specifically designed for English)?

That is the main job of a spellchecker.

It's called a spellchecker not a spellsuggester. Identifying which words are misspelled is at least as important for a spellchecker.

Also what does a pluggable interface gain over LSP?

I would consider LSP to be a "pluggable Interface", by which I mean the specific spell checking engine isn't hard coded and can easily be replaced by something else. Either because of user preference, or because a new better engine becomes available. I'm not sure LSP is the best fit, because that isn't exactly what it was designed for. But I definitely think using LSP would be better than embedding or linking directly against aspell or hunspell.

mcepl commented 1 year ago

but it definitely has significantly less activity than hunspell

Then why is the unspell suggest algorithm so bad? That is the main job of a spellchecker.

It is not bad, it at least works for other languages than en, aspell doesn’t. And if you want to make an universal system spellchecker (or spellchecker for Google Chrome, Firefox, LibreOffice, Mac OS X) the reality crashes on you heavily that English is not language of majority of humankind. Yes, I know, I was trying to use aspell for cs_CZ … you wouldn’t believe how that was bad.

jamessan commented 1 year ago

I think the ideal would be a way to make the spell checker pluggable.

Like enchant does?

mcepl commented 1 year ago

I think the ideal would be a way to make the spell checker pluggable.

Like enchant does?

Yes, but the best is enemy of good. Let’s have hunspell first, and we can think about more sophisticated solution.

vigoux commented 1 year ago

Yes, but the best is enemy of good. Let’s have hunspell first, and we can think about more sophisticated solution.

Don't forget that we have time to make a decision here: I'd rather go for a solution we know will be good for the long-term rather than a half-baked one which will be regretted later.

The thing is that the hunspell PR is also an occasion to rework the spell code around an external library.

As others noted, hunspell also does not seem to be the all around best alternative.

mcepl commented 1 year ago

The thing is that the hunspell PR is also an occasion to rework the spell code around an external library.

As others noted, hunspell also does not seem to be the all around best alternative.

I refer back to https://github.com/neovim/neovim/issues/12064#issuecomment-739422128 and although I clearly accept that hunspell is not the best spellchecker for some languages and it will quite certainly won’t be the best spellchecker forever, I think with the stress of “all around” it could be the best all around alternative right now.

And yes, I believe that the value of this PR (aside from neovim using the standard spellchecker on my system and every other Linux/Unix/Mac OS(?) I know about) is more in separating spellchecking into external module. It is the simplest thing that could possibly work and I think we should start with it. When we have something which people are actually capable of using, we get feedback and time to prepare some better solution. We may quite possibly not need anything more complicated or we will do something completely different.