Open okias opened 4 years ago
Wouldn't this be something that would be best to implement as a language server plugin for natural languages?
Wouldn't this be something that would be best to implement as a language server plugin for natural languages?
I don’t think so, because then you get quite often into horrible business of two language servers on one document. Also, I wonder whether it wouldn't be too slow.
There's already a integrated spell checker in upstream vim, the spelling dictionary files are located under runtime/spell. To enable it you just run :set spell spellang=<LANG>
, replacing \<LANG> with your language code eg. en_US for American English
But I noticed that these language files have been removed in Neovim but the spelling checking engine component has been left in
There's already a integrated spell checker in upstream vim, the spelling dictionary files are located under runtime/spell. To enable it you just run
:set spell spellang=<LANG>
, replacingwith your language code eg. en_US for American English
Sorry, you completely missed the point of this ticket. Of course, we all know about the Bram’s spellchecker, I use it (in neovim, of course, it works perfectly fine) all the time (including now), but I, the reporter of this issue, and many many others consider it stupid that contrary to the UNIX philosophy (do one thing and do it well) and contrary to every other program in Linux, vim does spellchecking on its own and doesn’t use the system spellchecker (on Linux it is hunspell, most of the time). Internal vim spellchecker is old, basically unmaintained (because almost unmaintainable) code in the biggest C file in the project, which should go and be replaced by the binding to the external spellchecker. One attempt to do so was ignored by Bram in https://github.com/vim/vim/pull/2500, other option would be to make spellchecker as a remote plugin, but currently everybody who took a look at spell.c
run away crying and nothing has changed.
It wasn't actually ignored. It just wasn't ready to be merged and you mentioned you had no interest in making it fit.
It wasn't actually ignored. It just wasn't ready to be merged and you mentioned you had no interest in making it fit.
Yes, it was ignored. I was asking for help, and nobody seemed to care, so I gave up. I was trying most of what I can do (I am not a C programmer myself), and only reply was Bram’s smug persuasion how vim internal spellchecker is the best thing since the sliced bread.
One attempt to do so was ignored by Bram in vim/vim#2500,
For the record, You explicitly said that. But Bram did not ignore it, he commented on that PR. Yes sometimes it takes until a patch is included. But that doesn't mean we are ignoring others work.
It's open source. If you want to have it done, do it yourself (or pay someone to do it).
Bram’s smug persuasion how vim internal spellchecker is the best thing since the sliced bread.
which was true, if you compared the state of spell checkers back when it was written and included in Vim with Version 7.
BTW: I am not a C programmer myself and had to learn it myself as well. Such is life.
It's open source. If you want to have it done, do it yourself (or pay someone to do it).
Yes, I know it, and I just gave up, so I haven’t complained about it until provoked. It doesn’t change a bit on the fact, that my effort has been ignored.
And that pull request was created from older Red Hat patch on 26 Dec 2017. Yes, vim’s internal spellchecker might be a good idea when it was introduced (the earliest tag I found on the GitHub repo was v7.0131 from 2005-08-07), but circumstances has changed since then. Everybody and their dog have some kind of packaging system, so it is not that big deal to depend on the external library/program.
That pull request (if finished, it never really completely worked) would be better than what we have now, but the ideal situation would be IMHO some kind of the remote plugin API, so we can have plugins for even unforeseen spellchecker, because the development in that area seems to be still in flux (I have just learned about existence of https://github.com/nuspell/nuspell and there are certainly others).
@mcepl they didn't ignore your effort. They answered clearly and repeatedly. They simply have other more urgent priorities than changing a working spellcheck engine. If your PR didn't even work, it will take time to fix; probably they'll have to redo it from scratch, who knows? Replacing the internal sp with hunspell won't really achieve anything, as hunspell sucks for many languages. Replacing it with enchant was attempted 2 years ago following your pull request, and given up because it requires redoing vim's api, that is, a ton of work and time. One thing is a working PR, another something that is only a rough first draft meant to induce other people to do the work for you ("I am not a C programmer, so the main purpose of this PR is to attract other people to finish this"). You can't really expect people to drop everything and run to fix your PR.
Fighting over the status of that closed PR doesn't really accomplish anything. Especially since it looks like that PR was specifically adding support for hunspell, and I agree with @mcepl that the correct way to do this is either to support a something like a remote-plugin API for spelling, or use something like enchant that supports multiple spelling engines (including hunspell and nuspell) as backends.
Replacing it with enchant was attempted 2 years ago following your pull request, and given up because it requires redoing vim's api, that is, a ton of work and time.
How far did this get? Is there enough there to salvage and create a new (working) PR adding enchant support?
One question I have is, what should happen to the internal spell checker if support is added for an external/system spell checker? Should it remain for complete backwards compatibility, and in case a system spell checker isn't available? Or should it be removed completely in favor of the new system?
How far did this get? Is there enough there to salvage and create a new (working) PR adding enchant support?
idk, this is the source: https://github.com/vim/vim/pull/2500#pullrequestreview-86001998 It seems the branch was never made available.
what should happen to the internal spell checker
Ideally it would be removed to further lighten the code base, but I guess some long time vimers will be opposed (though, those are probably still using the original vim).
Especially since it looks like that PR was specifically adding support for hunspell, and I agree with @mcepl that the correct way to do this is either to support a something like a remote-plugin API for spelling, or use something like enchant that supports multiple spelling engines (including hunspell and nuspell) as backends.
Well, better still supported and maintained hunspell than internal spellchecker. Problem with enchant is that it brings glib as additional external dependency and that is probably unacceptable to both Bram and neovim maintainers.
One question I have is, what should happen to the internal spell checker if support is added for an external/system spell checker? Should it remain for complete backwards compatibility, and in case a system spell checker isn't available? Or should it be removed completely in favor of the new system?
It should certainly be removed. If somebody want to rewrite the internal vim spellchecker into an independent library/independent program which could be bundled with vim as its default spellchecker, no problem with that.
idk, this is the source: vim/vim#2500 (review)
That's my hunspell PR, he was asking about the enchant one.
Problem with enchant is that it brings glib as additional external dependency and that is probably unacceptable to both Bram and neovim maintainers.
good point.
what if instead of designing a new neovim-specific msgpack protocol, we do something similar to the clipboard provider, where we just run an external program that uses the ispell pipe format (which enchant, hunspell, aspell, etc. all support).
what if instead of designing a new neovim-specific msgpack protocol, we do something similar to the clipboard provider, where we just run an external program that uses the ispell pipe format (which enchant, hunspell, aspell, etc. all support).
I am a big fan of the original vi logic of filtering through pipe, and I think that’s the best way how to do it. The only problem is that Windows (and less importantly, Solaris) users will hate us, because all creating new processes is horribly slow. Can ispell -a
(or its aspell, hunspell, etc. equivalents) be run as a daemon in background?
Can ispell -a (or its aspell, hunspell, etc. equivalents) be run as a daemon in background?
Depends on what you mean by that. It would be straightforward to start the process once for a single vim process, and communicate with it using stdin/stdout over the lifetime of the process. However, I'm not sure how difficult it would be to have a shared daemon across multiple vim processes.
Depends on what you mean by that. It would be straightforward to start the process once for a single vim process, and communicate with it using stdin/stdout over the lifetime of the process. However, I'm not sure how difficult it would be to have a shared daemon across multiple vim processes.
The former, just to avoid creating too many processes on Windows/Solaris.
Let me put here some previous pieces of art on the theme of using external spellcheckers:
It would be probably useful to go through those plugins (and any other found elsewhere), and find out which of them work.
Other important work would be to analyze spell.c and find out which symbols it exports to the rest of the editor, and which one of them would have to be replaced.
Wouldn't this be something that would be best to implement as a language server plugin for natural languages? @xylix
I find that this suggestion was insightful yet rapidly dismissed, maybe it deserves more attention?
If the file is just a plain .txt
draft, then it should not be a problem running 1 dedicated LSP for spellchecking.
If the file is a .tex
, a .rs
or a .py
, then care should be taken not to mix up the language commands and the actual words, which is a complicated task that only the currently running tex
-, rs
- or py
-LSP can legitimately tackle, and it is up to this.
In other words, I would find it DRY and meaningful that the running LSP be responsible for spell checking, taking into account all specificities of the various programming languages, and responsible for featuring any spellchecker the end user eventually likes. Would it be useful to drop a line at LSP
about it?
In other words, I would find it DRY and meaningful that the running LSP be responsible for spell checking, taking into account all specificities of the various programming languages, and responsible for featuring any spellchecker the end user eventually likes. Would it be useful to drop a line at
LSP
about it?
https://www.joelonsoftware.com/2001/04/21/dont-let-architecture-astronauts-scare-you/
Could we start with fixing those vimscript plugins first? http://wiki.c2.com/?DoTheSimplestThingThatCouldPossiblyWork
I would find it DRY and meaningful that the running LSP be responsible for spell checking
I'm not sure how that is DRY, it seems WET to me, because now every language server will be responsible for knowing how to do spell check, and different language servers may do so in incompatible ways (like use different personal wordlists). Now, I think it does make sense for an LSP server to indicate which regions of the file are eligible for spell checking (ex comments and strings), possibly tied in with semantic syntax highlighting. And possibly have a mechanism for the LSP server to give the editor a list of words that should be considered correctly spelled for that file. But I don't think the spell checking and recommendations themselves should be the responsibility of the LSP. Not to mention, what if you are editing a file that doesn't have an LSP server, or the LSP server doesn't support spellcheck, and you still want to run spell check?
btw, vim syntax files have a way to specify which parts of a file should be spell checked.
it seems WET to me, because now every language server will be responsible for knowing how to do spell check, and different language servers may do so in incompatible ways (like use different personal wordlists). Now, I think it does make sense for an LSP server to indicate which regions of the file are eligible for spell checking (ex comments and strings)
@tmccombs Yeah, sorry, that's what I meant (although I wasn't clear maybe): the only spellcheck-related role of the LSP is to isolate these regions, then forward to the actual spellchecker. This way no spellchecker has to be written twice.
what if you are editing a file that doesn't have an LSP server?
Well, there must be at least 1 lightweight, spellcheck-only LSP, whose only job is to forward the file to the spellcheker and return the linting information. The linting logic is already shared with other LSPs, and the actual analysis of the file is very basic (it's just 1 big region to spellcheck).
or the LSP server doesn't support spellcheck, and you still want to run spell check?
My point is that such a server would not be considered LSP-compliant if spellchecking were actually part of the protocol. This is why I think about dropping them a line. If spellchecking is added to LSP, then all LSP servers will have to integrate their share of language-specific spellchecking logic. I'll open a thread there, just to take the temperature on that.
In the meantime, yeah, there will exist incomplete LSP-servers with no spellchecking support yet. In this transient situation, maybe it's good enough to run the lightweight, spellcheck-only LSP server in parallel? I have to admit that I don't know how difficult it is to accept two parallel sources of linting.
btw, vim syntax files have a way to specify which parts of a file should be spell checked.
I personally find this WET, since the vim syntax file and the LSP server internal parsing logic are duplicates of each other.
then forward to the actual spellchecker
my concern is, how does that happen? if it just returns the regions to the editor, and then the editor does spell check using the "system" spell checker, all well and good. But if the LSP server is responsible for return spell check results to the editor, that means the LSP server has to deal with issues like determining which spellchecker to use, cross-platform compatibility etc. and that work will have to be done for each LSP server.
My point is that such a server would not be considered LSP-compliant if spellchecking were actually part of the protocol.
That would break backwards-compatibility. And it doesn't solve the problem that not all languages have an LSP server (or the user may not want to use the LSP server).
I personally find this WET, since the vim syntax file and the LSP server internal parsing logic are duplicates of each other.
My point here, is that the regions to do spell check on are tightly tied to the syntax. So if/when LSP allows sending syntax highlighting information, ideally, it could send info about spelling regions as well. Then spellcheck could work basically the same way with an LSP server as without one.
Well, thank you for discussing this :)
then forward to the actual spellchecker
my concern is, how does that happen?
I'd say it ideally happens the way you eventually describe:
So if/when LSP allows sending syntax highlighting information, ideally, it could send info about spelling regions as well. Then spellcheck could work basically the same way with an LSP server as without one.
About compatibility..
That would break backwards-compatibility.
I'm sorry I don't understand why, or which compatibility.
About which spellchecker to use..
the LSP server has to deal with issues like determining which spellchecker to use, cross-platform compatibility etc.
Maybe this is better tackled by the client. LSP clients have config files dedicated to explicit these choices with entries like spellchecker-to-use: "path/to/my/favourite/spellchecker"
, and clients know platform specificities.
This said, I understand that there is still something blurry in the process:
I don't know whether spellcheckers output are already standardized in any way. If they are, then the latter interfacing is a one-shot standardizing effort from LSP, then implementation repercuted on each LSP server. If they are not, then it also imply that one LSP-specific interface needs be featured by each spellchecker, in addition to how they already work. This sounds like a long-run effort indeed.
Anyway, as an appointed "astronaut", I agree that there is nothing neovim-specific left in this piece of the discussion, so maybe we should better move it there.
Just to revive this a little, there is enchant library form AbiWord that supports many backends. Maybe it would be worth reviewing as a spell checking backend library.
I think the decision which backend to use (enchant, hunspell directly, nuspell, or creating API for remote plugins) is less important, than actually hewing API from the swamp of spell.c
and separating all functionality required from such backend.
Just to revive this a little, there is enchant library form AbiWord that supports many backends.
FWIW, I did try to integrate enchant into Vim awhile back. While I got it to be superficially functional, I came to realize that much of the remaining work needed to take advantage of Vim's internal heuristics for improving suggestions relies on Hunspell like APIs.
If/when I get the time to finish this work, it would be using Hunspell (or a library with similar APIs), even though I too think that Enchant provides better end-user flexibility.
FWIW, I did try to integrate enchant into Vim awhile back. While I got it to be superficially functional, I came to realize that much of the remaining work needed to take advantage of Vim's internal heuristics for improving suggestions relies on Hunspell like APIs.
And you still get hit with the glib dependency, which is probably unacceptable. It would have to be a truly remote plugin.
There's already a integrated spell checker in upstream vim, the spelling dictionary files are located under runtime/spell. To enable it you just run
:set spell spellang=<LANG>
, replacingwith your language code eg. en_US for American English But I noticed that these language files have been removed in Neovim but the spelling checking engine component has been left in
So right now in neovim it is impossible to set up spell checking for any language other than english?
No, of course not -- the dictionaries are just not shipped by default. See :h spell-load
.
@clason well for me it prints that spellfile#LoadFile(): Nread command is not available.
relating to spellfile.vim that is specifed in :h spell-load
. can it be the cause why spell files not loading?
It should prompt you to download them on enabling spellchecking. When I move my config to a new computer and try to activate a spellchecking with my keymaps it prompts (Y/n)
@yar-fed I do
nvim -u NONE -V2 Makefile
:set spelllang=en,ru
:set spell
not found in runtime path: "spell/en.utf-8.add.spl"
not found in runtime path: "spell/ru.utf-8.spl"
not found in runtime path: "spell/ru.ascii.spl"
Warning: Cannot find word list "ru.utf-8.spl" or "ru.ascii.spl"
Press ENTER or type command to continue
No prompts or any downloads. nvim version is v0.6.0-dev+409-gf6a9f0bfc Should i fill a bug report?
-u NONE
disables all builtin plugins, including the one for downloading spell files. And I don't know what -V2
is supposed to do.
But, yes, this is unrelated to this issue, so it would be better to ask on Matrix or Discourse. (I don't think there's a bug here, so I wouldn't jump straight to opening an issue.)
Hi, I am reviving this issue after some comments on vim/vim about hunspell integration (especially for french dictionnaries, which are horrible).
The main arguments in vim/vim (which are all valid in vim/vim's context) are:
SpellRare
), and words from other variants of a language (SpellLocal
)Now to answer to those points one by one:
SpellRare
is supported. I did not look into haw SpellLocal
is implemented, but if it is an extension to the dictionnary format, then it is quite probably not supported. In the contrary, if this is part of the MySpell specification then hunspell probably silently supports it, as it supports MySpell dictionnaries.On a side note, using hunspell should allow us to stop us from using the weird dictionnary management setup that vim/vim uses (download dictionnary files and such) and only use the system dictionnaries, which seems a lot better (to me at least).
On a side note, using hunspell should allow us to stop us from using the weird dictionnary management setup that vim/vim uses (download dictionnary files and such) and only use the system dictionnaries, which seems a lot better (to me at least).
This would be very good!
Vim has its own spell code, which (at the time I wrote it) is vastly superior to all existing spell libraries. aspell comes closest
spellchecking while redrawing is a terrible idea. I mean, really terrible. Awful. Seriously.
Reworking the internals to avoid that may be costly, but in general it seems like a strange architecture: why do spell higlights need to be synchronous? A cheap alternative could be to defer to LSP...
I did take a quick look at Aspell and it does seem to provide better suggestions for English, including the ones mentioned here.
We should also look at enchant which is just a wrapper around many backends (including aspell and hunspell).
We should also look at enchant which is just a wrapper around many backends (including aspell and hunspell).
enchant requires dependency on glib.
A couple potential issues with aspell:
There is also nuspell, but it is newer, and I'm not sure how good the suggestions are, although it does support hunspell dictionaries.
I think the ideal would be a way to make the spell checker pluggable.
aspell: It is no longer maintained
based on what? Last release on http://aspell.net/ is 2019, and https://github.com/GNUAspell/aspell seems to be alive.
the hunspell dictionaries are more complete, even for English.
That is proving to be a minor concern. It is a one-time cost to convert dictionaries from one format to another (plus it's easy to hit zg
). Whereas the suggestion algorithm is the main feature that actually needs to work well.
I think the ideal would be a way to make the spell checker pluggable.
Doesn't really solve anything if all the backends suck.
aspell: It is no longer maintained
based on what? Last release on http://aspell.net/ is 2019 Multiple places discussing why hunspell is preferred over aspell. Although, it looks like maybe those sources are outdated, and there has been at least one release since then. Although there weren't any releases between 2011 and 2019.
and https://github.com/GNUAspell/aspell seems to be alive.
Does it? The only change in the last two years was to the manual. I'll concede it might not be completely unmaintained, but it definitely has significantly less activity than hunspell, and relatedly, is used by fewer downstream projects.
It is a one-time cost to convert dictionaries from one format to another
Perhaps. Although, as far as I know there aren't existing tools to do that conversion, and I'm not even sure it is possible without data loss.
(plus it's easy to hit
zg
).
As long as you are confident you spelled it correctly 😉.
I think the ideal would be a way to make the spell checker pluggable.
Doesn't really solve anything if all the backends suck.
On the contrary, if they all suck, but in different ways, you can pick the one that works best for you. If you can switch engines at runtime you could even do something like use hunspell by default, but if it doesn't give good suggestions, switch to aspell to see if that gives you better suggestions.
but it definitely has significantly less activity than hunspell
Then why is the unspell suggest algorithm so bad? That is the main job of a spellchecker.
On the contrary, if they all suck, but in different ways, you can pick the one that works best for you.
That is a cop-out that doesn't change the game for 98% of Nvim users. We need something that we can ship. Also what does a pluggable interface gain over LSP?
Then why is the unspell suggest algorithm so bad?
IDK. From what I understand, hunspell is supposed to work better for several non-english languages. Perhaps that comes at the cost of an engine that doesn't work quite as well for English (because it wasn't specifically designed for English)?
That is the main job of a spellchecker.
It's called a spellchecker not a spellsuggester. Identifying which words are misspelled is at least as important for a spellchecker.
Also what does a pluggable interface gain over LSP?
I would consider LSP to be a "pluggable Interface", by which I mean the specific spell checking engine isn't hard coded and can easily be replaced by something else. Either because of user preference, or because a new better engine becomes available. I'm not sure LSP is the best fit, because that isn't exactly what it was designed for. But I definitely think using LSP would be better than embedding or linking directly against aspell or hunspell.
but it definitely has significantly less activity than hunspell
Then why is the unspell suggest algorithm so bad? That is the main job of a spellchecker.
It is not bad, it at least works for other languages than en
, aspell doesn’t. And if you want to make an universal system spellchecker (or spellchecker for Google Chrome, Firefox, LibreOffice, Mac OS X) the reality crashes on you heavily that English is not language of majority of humankind. Yes, I know, I was trying to use aspell for cs_CZ
… you wouldn’t believe how that was bad.
I think the ideal would be a way to make the spell checker pluggable.
Like enchant does?
I think the ideal would be a way to make the spell checker pluggable.
Like enchant does?
Yes, but the best is enemy of good. Let’s have hunspell first, and we can think about more sophisticated solution.
Yes, but the best is enemy of good. Let’s have hunspell first, and we can think about more sophisticated solution.
Don't forget that we have time to make a decision here: I'd rather go for a solution we know will be good for the long-term rather than a half-baked one which will be regretted later.
The thing is that the hunspell PR is also an occasion to rework the spell code around an external library.
As others noted, hunspell also does not seem to be the all around best alternative.
The thing is that the hunspell PR is also an occasion to rework the spell code around an external library.
As others noted, hunspell also does not seem to be the all around best alternative.
I refer back to https://github.com/neovim/neovim/issues/12064#issuecomment-739422128 and although I clearly accept that hunspell is not the best spellchecker for some languages and it will quite certainly won’t be the best spellchecker forever, I think with the stress of “all around” it could be the best all around alternative right now.
And yes, I believe that the value of this PR (aside from neovim using the standard spellchecker on my system and every other Linux/Unix/Mac OS(?) I know about) is more in separating spellchecking into external module. It is the simplest thing that could possibly work and I think we should start with it. When we have something which people are actually capable of using, we get feedback and time to prepare some better solution. We may quite possibly not need anything more complicated or we will do something completely different.
As mentioned in https://github.com/neovim/neovim/issues/356 would be great to have external spellchecking.
I noticed there was some work on hunspell https://github.com/vim/vim/pull/2500 , but nothing finished.
Is any chance to integrate enchant or nuspell https://nuspell.github.io/ (or at least older hunspell) as an option to Neovim?
Please keep this at least as placeholder to track progress. @mcepl