Dictionary names in `guess-language-langcodes`

joostkremers commented 1 month ago

I noticed that the dictionary names in guess-language-langcodes sometimes use English names, sometimes native names, and sometimes just a two-letter language code. Cf.:

(cs     . ("czech"      "Czech"   "🇨🇿"   "Czech"))
(da     . ("dansk"      nil       "🇩🇰"   "Danish"))
(de     . ("de"         "German"  "🇩🇪"   "German"))

I ran into this when I noticed that Emacs couldn't spell-check Dutch. Turns out that aspell, which I use as back-end, doesn't recognise nederlands, it only recognised dutch.

I changed the entry in guess-language-langcodes (it's a user option, after all) but it made me wonder if it might make sense to add multiple entries for some languages? If aspell knows the language as dutch, but ispell or hunspell uses nederlands, wouldn't it be better to have entries for both?

Or is this something that should be handled by Emacs' ispell module?

tmalsburg commented 1 month ago

I think the situation is even more complicated. For instance, there are many different dictionaries for German (Swiss, Austrian, etc.). There is no way of knowing which one the user wants.

But multiple entries are possible and we already use this for Serbian which can be written in latin or cyrillic script. But you were perhaps thinking of a different solution, something like:

    (nl     . (("nederlands" "dutch") nil         "🇳🇱"   "Dutch"))

That might be ideal, but I'm not sure I will find the time to implement it. These days I have basically no time at all for Emacs development. Sad but true.

Or is this something that should be handled by Emacs' ispell module?

How could ispell handle this?

joostkremers commented 1 month ago

I think the situation is even more complicated. For instance, there are many different dictionaries for German (Swiss, Austrian, etc.). There is no way of knowing which one the user wants.

Right, I hadn't even considered that.

But multiple entries are possible and we already use this for Serbian which can be written in latin or cyrillic script. But you were perhaps thinking of a different solution, something like:
    (nl     . (("nederlands" "dutch") nil         "🇳🇱"   "Dutch"))

That hadn't even occurred to me. I was thinking simply of this:

(nl  .  ("nederlands"  nil   "🇳🇱"   "Dutch"))
(nl  .  ("dutch"       nil   "🇳🇱"   "Dutch"))

But obviously that's impossible, because it's an alist, so only the first element counts... Silly me.

That might be ideal, but I'm not sure I will find the time to implement it.

No worries. It's perfectly usable as it is. It was easy to customise.

Or is this something that should be handled by Emacs' ispell module?

How could ispell handle this?

Well, it would be possible for ispell-the-Emacs-module to keep a list of language names for the three spell checkers it supports by default (aspell, ispell and hunspell) and a list of equivalences (e.g., if nederlands fails, try dutch). I was mainly wondering if perhaps my configuration of ispell-the-Emacs-module was faulty.

I'm gonna close this, I don't think there's any serious issue here. Thanks for the quick reply.

tmalsburg commented 1 month ago

As you say, the key in the alist needs to be unique. But you could in principle use nl_ispell and nl_aspell.

tmalsburg / guess-language.el

Dictionary names in `guess-language-langcodes` #41