mawww / kakoune

mawww's experiment for a better code editor
http://kakoune.org
The Unlicense
9.91k stars 712 forks source link

aspell with other languages broken with Kakoune v2020.09.01 #3898

Open nonumeros opened 3 years ago

nonumeros commented 3 years ago

I'm just wondering if there could be a new release that addresses an issue I've been having problems with. Namely, aspell support.

As of this writing and with the current head, this is not an issue. (As far as I can tell so far.)

As it is, the https://github.com/mawww/kakoune/releases/tag/v2020.09.01 release breaks aspell with other languages.

Someone might say that I'd be better off by taking it up with the distribution maintainers in question.

The problem with this is the following. The official repo from the distribution that provides kakoune, is based on every release of the project.

https://github.com/mawww/kakoune/releases/download/v${pkgver}/kakoune-${pkgver so, I'm not so sure they will change the above line, anytime soon.

For an earlier commit available, there's another package provided by the AUR community which in turn is based on an earlier version of kakoune, which I highly recommend. It didn't seem to have any issue with aspell. @lenormf is the maintainer there.

But now, in order to have a functioning kakoune with aspell, I need to have another package manager, rather than going to the official repo of the distribution, if I wanted to. There's an app for that, as the saying goes.

I also tested and retested it as many times as possible, and I can almost assure you, that aspell with other languages, is not working. I also downloaded the release, compiled it, and no matter what I tried, it seemed to fail every time.

On the other hand, I could flag the package directly with the distro as out of date, or whatever the convention for out of date is. Furthermore, the distribution specifies that, quote, If you notice a package is out-of-date (i.e., there is a newer stable release available), then please notify us using the form below. Do not report bugs via this form!, endquote

But this is ambiguous. What I might consider stable, it may not be for the rest. And viceversa.

And if there's a release, whether is unstable or not, is not applicable here either. For there's none.

The only reason I'm asking for this, is for the simple reason that a release — whatever amount of time it takes — might take months, before becoming available. Right? In the meantime, with the latest release, aspell does not work or works partially, since it breaks identifying other languages installed in the system.

lenormf commented 3 years ago

What is the problem, exactly? Steps, outcome, expected, versions etc.

nonumeros commented 3 years ago

@lenormf I first installed the release from the official repo (see the image and I also tried with the official pacman), , then I downloaded https://github.com/mawww/kakoune/releases/tag/v2020.09.01 as I've done in the past. Calling out :spell es or :spell es_ES or :spell es-ES would not highlight anything on the *scratch* buffer on neither iteration, whether it'd be the compiled version from the tarball release, or from the official repo by @maximbaz

Only yours kakoune-git seemed to work.

Then as I went back and forth, I realized that since yours had worked with aspell es , something had broken along the way.. The commit that did it, I don't know. I just downloaded the most recent day-old kakoune and had no problems.

Screenshot_2020-11-19_06-44-04

nonumeros commented 3 years ago

@lenormf what I said above stands. But I noticed the date of your last update with kakoune-git and it says

Last Updated: | 2020-05-11 07:47

Well, I think the format is May?? mmm… well, the comments threw me off then.

And now I see that by comparing it against the number on kakoune-git it matches a 60154300 Support piping data to client stdin from the log.

Anyhow. What I said stands. Yours worked. The official repo didn't.

maximbaz commented 3 years ago

kakoune-git package builds the latest code from master branch, kakoune package follows the latest Github release, so the one from September. If it's a small patch to fix aspell support in the latest release and you can identify it, I can add it to the package, otherwise let's just wait for the next release 🙂 Would you like to try to git bisect to find the first commit where the issue was fixed?

nonumeros commented 3 years ago

The first one that got it fixed between v2020.01.16 and v2020.08.04 was 67662976 because e83ad2a2 broke it.

The second one that seemed to fix it was dacaad4e because anything prior to that was not working either.

nonumeros commented 3 years ago

e83ad2a2a4c78e6c1b753664942f2ddf744a85f3 is the first bad commit commit e83ad2a2a4c78e6c1b753664942f2ddf744a85f3 Date: Mon Mar 9 14:22:34 2020 +0300

rc spell: Re-implement message processing in Awk

Plain shell takes too long on large files.

Fixes #3399

rc/tools/spell.kak | 69 +++++++++++++++++++++++++++++++++---------------------

lenormf commented 3 years ago

It could be that your implementation of Awk doesn't support C++ style comments, which 94cdd3f9e3d8394a31515ad24e9ad226a225e9ac fixes. Or some undocumented control character produced by Aspell, fixed by ff8d4d6567cc0db94d05bef25b7979ddf34dfb78.

Could you check if the above commits fix the issue for you? If either work, a simple cherry-picking on the side of the package maintainers might be all that's needed. But ideally a new release would be the best way of handling this.

nonumeros commented 3 years ago

@lenormf sorry give me a second. I think you're right with the ff8d4d6 as it seems to have taken care of it.

nonumeros commented 3 years ago

I missed this one Bisecting: 0 revisions left to test after this (roughly 0 steps) [a7931fa0f7 which was the one prior to the ff8d4d6 My mistake (I had a few sessions open.)

I'l check again but you're right. It seems to have fix it.

lenormf commented 3 years ago

@mawww Could you please close this with a reply stating what should be done here?

rouanth commented 2 years ago

Happens for me in v2021.11.07. Not sure if this is the same issue. If you say so, I'll file it separately.

File on which to reproduce (in the Russian language, with aspell, language ru):

ПривXт, тXт ошибXа

Contents of spell_regions:

22 '1.1,1.6|DiagnosticError' '1.14,1.16|DiagnosticError' '1.20,1.25|DiagnosticError'                                                                  █

Only parts of the file are highlighted, even though all words contain an error in the form of an X mark. Looks like this: "ПривXт, тXт ошибXа" Picture: 20220110_21h02m49s_grim

Screwtapello commented 2 years ago

I think it's unrelated to this issue.

The coordinates in spell_regions are supposed to be of the form:

In UTF-8 encoding, ПривXт is 12 bytes long, so the region 1.1,1.6 should highlight just the first half. It looks like the "ending byte" is actually being reported as "characters", not "bytes".

On the other hand, since ПривXт is 12 bytes long, тXт does start at byte 14, so the "starting byte" is correctly being reported in bytes.

Looking at the code, the starting byte comes from aspell's output, but the ending byte is calculated by awk's length() function:

https://github.com/mawww/kakoune/blob/9acd4e62dc485aa7e44a601a0300697f8825a98c/rc/tools/spell.kak#L60-L62

And it turns out that different awk implementations implement length() differently:

$ gawk 'BEGIN { print length("ПривXт"); }'
6
$ mawk 'BEGIN { print length("ПривXт"); }'
11

Specifically, gawk (GNU Awk) is the only implementation that treats strings as a sequence of characters in the current encoding; every other implementation (mawk, nawk, whatever the BSDs use, original Bell Labs awk) treats strings as a sequence of bytes, regardless of encoding.

If you're using a Debian-based system, you can uninstall gawk and install awk and things should work properly (and much faster!). Presumably other Linux distros have something similar.

rouanth commented 2 years ago

Thanks for the analysis! I'm not on a Debian-based system, but managed to solve the issue by replacing the call to awk in the script with a call to env LANG=C awk. It did the trick for gawk, though I'm not sure how other awk implementations would react.