tmalsburg / guess-language.el

Emacs minor mode that detects the language you're typing in. Automatically switches spell checker. Supports multiple languages per document.
115 stars 14 forks source link

flyspell-large-region: Wrong type argument: stringp, nil #21

Closed francesquini closed 5 years ago

francesquini commented 5 years ago

M-x flyspell-buffer fails with error flyspell-large-region: Wrong type argument: stringp, nil when guess-language-mode` is enabled. The major mode does not seem to be a factor to reproduce the error (it breaks in fundamental, text and org modes).

Steps to reproduce:

(setq guess-language-languages '(en fr pt)) (setq guess-language-min-paragraph-length 35) (setq guess-language-langcodes '((en . ("en_US" "English")) (fr . ("fr_FR" "French")) (pt . ("pt_BR" "Portuguese")) ))

- Create a new buffer and paste the following text:

The following lines must be in a language different from the language of the first line to cause the bug. The number of lines below seems to be the minimum necessary to make the bug manifest itself in my system.

Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste. Uma linha longa o suficiente para que o problema se manifeste.

- Run `M-x flyspell-buffer` - Everything works as expected
- Enable guess-language-mode `M-x guess-language-mode`
- Run `M-x flyspell-buffer` - Flyspell terminates with the following error before the end of the spell checking: `flyspell-large-region: Wrong type argument: stringp, nil`

Full output of the messages buffer:

For information about GNU Emacs and the GNU system, type C-h C-a. Mark set Loading /tmp/guess-language.el/guess-language.el (source)...done You can run the command ‘eval-buffer’ with M-x ev-b RET Loading /tmp/guess-language.el/guess-language.el (source)...done (New file) Mark set Starting new Ispell process /usr/bin/aspell with default dictionary... Checking region... Spell Checking...100% [manifeste] Spell Checking completed. You can run the command ‘flyspell-buffer’ with M-x fl-bu RET Spell Checking completed. Guess-Language mode enabled in current buffer You can run the command ‘guess-language-mode’ with M-x gue-mo RET Guess-Language mode enabled in current buffer Checking region... Ispell process killed Local Ispell dictionary set to pt_BR Starting new Ispell process /usr/bin/aspell with pt_BR dictionary... Detected language: Portuguese flyspell-large-region: Wrong type argument: stringp, nil


Debugger output:

Debugger entered--Lisp error: (wrong-type-argument stringp nil) flyspell-external-point-words() flyspell-large-region(1 1221) flyspell-region(1 1221) flyspell-buffer() funcall-interactively(flyspell-buffer) call-interactively(flyspell-buffer record nil) command-execute(flyspell-buffer record) execute-extended-command(nil "flyspell-buffer" "flyspell-bu") funcall-interactively(execute-extended-command nil "flyspell-buffer" "flyspell-bu") call-interactively(execute-extended-command nil nil) command-execute(execute-extended-command)



Emacs 26.1
guess-language from git (https://github.com/tmalsburg/guess-language.el/commit/bc6fe11d7ea36d5319ac05c00d52b50d42d64cea)
tmalsburg commented 5 years ago

I can't fully replicate this. If I test this in a buffer called test.txt, I get this error message

ispell-change-dictionary: Undefined dictionary: pt_BR

which is fair enough because I haven't installed the dictionary for this language. When I remove pt_BR from the list of languages, everything works as expected.

Now, if I apply your recipe in a buffer called test.org, I get an error message but it's not the one you report:

org-element--cache-find: Wrong type argument: avl-tree-, [cl-struct-avl-tree- [nil nil nil 0] org-element--cache-compare]

WIth my standard setup (not emacs -q), everything works okay.

Did you test with an org-file or just a plain text file?

tmalsburg commented 5 years ago

p.s: Just saw in that you wrote in #20 that you get the same result in fundamental and in org mode. Hm, so the question is: why am I getting different results?

tmalsburg commented 5 years ago

Update: The culprit in the experiment with emacs -q and test.org was that org-element-use-cachewas set to t. Caching is apparently known to cause issues and it is disabled in more recent versions of org mode (see here). If I set this variable to nil there is no problem with flyspell-buffer. Could you please try this in your setup?

francesquini commented 5 years ago
  1. Update: The culprit in the experiment with emacs -q and test.org was that org-element-use-cachewas set to t. Caching is apparently known to cause issues and it is disabled in more recent versions of org mode (see here). If I set this variable to nil there is no problem with flyspell-buffer. Could you please try this in your setup?

The major mode is not a factor - The problem happens even in fundamental mode.
Anyway, just to be sure I retested setting org-element-use-cache to nil, and the problem still persists.

  1. p.s: Just saw in that you wrote in #20 that you get the same result in fundamental and in org mode. Hm, so the question is: why am I getting different results?

I believe it is because you actually removed pt_BR from the list of languages. The language itself does not matter as long as there are at least two distinct detectable languages (i.e. must also be present in guess-language-langcodes) in the same buffer, and appropriate language files are installed on your system.

Here is an example using English and French. Following the same steps described here (https://github.com/tmalsburg/guess-language.el/issues/21#issue-384103860) replacing the example text for the one below is enough to reproduce the error (`flyspell-large-region: Wrong type argument: stringp, nil')

The following lines must be in a language different from the language of the first line to cause the bug.
The number of lines below seems to be the minimum necessary to make the bug manifest itself in my system.

Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
Une ligne assez longue pour démontrer le problème.
tmalsburg commented 5 years ago

Sorry if my earlier messages were confusing. I had tried your recipe with just two languages (and org caching switched off) and everything worked as expected in org mode, text-mode, and fundamental-mode. In other words, I can't replicate the issue. My Emacs is 26.1.50 built from git on Ubuntu.

francesquini commented 5 years ago

All right, let's try to pinpoint the problem.

I'm using:

The problem only happens if:

In other words. the problem seems to happen only when a new language is detected during the spell checking inside the function flyspell-large-region. At this moment the old process is killed (as evidenced by the messages buffer copied below) and flyspell looses track of the evaluation since it does not expect the ispell process to be replaced. The relevant part of the code seems to be contained by the function flyspell-large-region.

...
Checking region...
Ispell process killed
Local Ispell dictionary set to pt_BR
Starting new Ispell process /usr/bin/aspell with pt_BR dictionary...
Detected language: Portuguese
flyspell-large-region: Wrong type argument: stringp, nil

As a matter of fact, tweaking the customization variable flyspell-large-region from its default of 1000 (attention, the name of the variable is the same name of the function) to nil we can get rid of the problem entirely since it deals with each word individually never resorting to the use of the function flyspell-large-region. As per its doc:

(defcustom flyspell-large-region 1000
  "The threshold that determines if a region is small.
If the region is smaller than this number of characters,
`flyspell-region' checks the words sequentially using regular
flyspell methods.  Else, if the region is large, a new Ispell process is
spawned for speed.

Doubled words are not detected in a large region, because Ispell
does not check for them.

If this variable is nil, all regions are treated as small."
  :group 'flyspell
  :version "21.1"
  :type '(choice number (const :tag "All small" nil)))

While this workaround works, dealing with each word individually is extremely slow.

So, from what as far as I can understand, if you are not experiencing the same error in your setup one of the following might be happening:

Could you share the output of your message buffer after the evaluation of the steps of https://github.com/tmalsburg/guess-language.el/issues/21#issue-384103860 so we can try and compare ?

As for possible solutions, I do not believe ispell/hunspell/aspell has the support to change the dictionary on the fly. It can however receive additional dictionaries (--extra-dicts) for a single spell checking. Thus from the top of my head I can think of three alternatives to use when large region support is used:

  1. Spell check the whole buffer with the current dictionary and add as extra dictionaries the list of all the dictionaries guess-language has configured.

  2. Run guess-language before the spell checking to determine the regions in which each language is used and call ispell repeatedly

  3. A mix between 1 and 2, that is, call ispell with --extra-dicts only with the detected languages.

tmalsburg commented 5 years ago

I just tried your recipe once again. This time, I wanted to follow it as closely as possible and installed the Portuguese dictionary for aspell and, surprise, now I get an error:

(New file)
Mark set
Auto-saving...done
Starting new Ispell process /usr/bin/aspell with default dictionary...
Checking region...
Spell Checking...100% [manifeste]
Spell Checking completed.
You can run the command ‘flyspell-buffer’ with M-x fl-bu RET
Spell Checking completed.
Guess-Language mode enabled in current buffer
You can run the command ‘guess-language-mode’ with M-x gue-mo RET
Guess-Language mode enabled in current buffer
Checking region...
Ispell process killed
Local Ispell dictionary set to pt_BR
Starting new Ispell process /usr/bin/aspell with pt_BR dictionary...
Detected language: Portuguese
flyspell-large-region: Wrong type argument: stringp, nil

When I toggle-debug-on-error, I get the following stack trace:

Debugger entered--Lisp error: (wrong-type-argument stringp nil)
  flyspell-external-point-words()
  flyspell-large-region(1 1222)
  flyspell-region(1 1222)
  flyspell-buffer()
  funcall-interactively(flyspell-buffer)
  call-interactively(flyspell-buffer record nil)
  command-execute(flyspell-buffer record)
  execute-extended-command(nil "flyspell-buffer" "flyspell-buffer")
  funcall-interactively(execute-extended-command nil "flyspell-buffer" "flyspell-buffer")
  call-interactively(execute-extended-command nil nil)
  command-execute(execute-extended-command)

My flyspell-large-region is at the default value 1000. However, when I set it to 10 in your recipe, I still get the an error: Emacs hangs, stuck in an endless loop that I can interrupt with C-g.

I think what's happening is this:

My guess is that Flyspell doesn't want its dictionary changed while it's running. To confirm I ran flyspell-buffer but with an empty flyspell-incorrect-hook

(let ((flyspell-incorrect-hook nil)) (flyspell-buffer))

and this doesn't produce an error. Experiment 2: The same with flyspell-large-region set to 10: works!

So now that we found the problem, the question is how to address it. A quick and easy fix would be to advice flyspell-buffer such that it temporarily clears guess-language-function from flyspell-incorrect-hook. That solves the immediate problem. However, with this fix flyspell-buffer checks the whole buffer with just one dictionary, even if multiple languages are present. Therefore a better solution would be to modify Flyspell such that it detects changed dictionaries and then starts a new ispell/aspell/hunspell process from that point in the buffer with the new language. What do you think?

francesquini commented 5 years ago

A quick and easy fix would be to advice flyspell-buffer such that it temporarily clears guess-language-function from flyspell-incorrect-hook. That solves the immediate problem. However, with this fix flyspell-buffer checks the whole buffer with just one dictionary, even if multiple languages are present.

I guess that would defeat the purpose of guess-language since it would be equivalent to disabling guess-language for large enough buffers which are, in my case, most of the time.

Therefore an even better solution would be to modify Flyspell such that it detects changed dictionaries and then starts a new aspell process from that point in the buffer with the new language. What do you think?

This would be the best approach IMHO. However I'm not sure how big (and complex) the modifications would need to be to make it work.

Maybe a simpler approach (if the one you propose is too complicated to implement) would be:

The disadvantage in this case would be that misspellings in a language that happen to be correct in any of the other detected languages would not be detected. This shouldn't be a very common problem though.

tmalsburg commented 5 years ago

I had a closer look at what Flyspell is doing and I think I understand a bit better what's happening. The problem is not per se that the dictionary is changed while flyspell is running. Instead it is probably that detecting a new language triggers re-checking of the current paragraph and then we have two instances of Flyspell running concurrently. Not 100% sure but I think something like this must be happening: The first Flyspell instance creates a buffer *flyspell-region*, the second recreates it and then closes it when finished. Then the first instance tries to use that buffer but it's gone. The string in (wrong-type-argument stringp nil) is supposed to be the name of the buffer but it's got deleted. Again, not 100% sure but I think it's something close to that.

tmalsburg commented 5 years ago

Hm, running flyspell-buffer with multiple dictionaries is not my preferred solution. One reason is false negatives (as you mention). Depending on the pair of languages that is used, the potential of missing typos could be too high to be acceptable. I also wouldn't like it if spellchecking behaved differently during typing and during flyspell-buffer. Perhaps it would be good to ask Flyspell developers for advice. There may be a simple solution that we're not seeing because we don't understand Flyspell's internals enough.

tmalsburg commented 5 years ago

Posted on emacs-devel. Let's see what happens.

tmalsburg commented 5 years ago

Bottom line of the discussion on emacs-devel: There doesn't seem to be a simple fix that allows us to use flyspell-buffer with a multilingual document. The spell checker processes the whole document in one go and flyspell-incorrect-hook is called only when the show is over. This means that we need our own reimplementation of flyspell-buffer which checks the document paragraph by paragraph. It's a bummer because so far guess-language was completely transparent to the user but I'm not sure it still can be after this addition.

Regarding the bug that you found: An easy solution would be to defadvice flyspell-buffer with a wrapper that temporarily deactivates guess-language.

Another outcome of the discussion on emacs-devel: Richard Stallman showed interest in this package and he suggests that we add hooks in ispell.el that make integration easier. I think he means a hook that allows us to guess the buffer language before ispell runs. It would make sense to have the same in flyspell. Taking the liberty to page @cpitclaudel since you made a related comment. Do you think such hooks would the best approach? Feel free to ignore if you don't have time to comment.

tmalsburg commented 5 years ago

Just pushed a preliminary fix: 1107b93621e758bee07b2b24139e9e4557dc1d6e