marcoagpinto / aoo-mozilla-en-dict

English Dictionaries Project (AOO+Mozilla+others)
159 stars 24 forks source link

πŸ› Fix apostrophe handling #74

Closed Jamim closed 5 months ago

Jamim commented 6 months ago

Hello @marcoagpinto,

Thank you for maintaining this set of dictionaries! πŸ™‡πŸΌ

Due to the lack of the ’ character in WORDCHARS, Hunspell can't properly handle words with apostrophes.

$ cat test.txt
I've been surprised that it doesn't work as expected!
$ hunspell -l -d en_AU test.txt
ve
doesn
$ hunspell -l -d en_CA test.txt
ve
doesn
$ hunspell -l -d en_GB test.txt
$ hunspell -l -d en_US test.txt
ve
doesn
$ hunspell -l -d en_ZA test.txt
ve
doesn

As you can see, it works only for en_GB since there is a kind of apostrophe in WORDCHARS already.

Best regards!

marcoagpinto commented 6 months ago

Heya,

I only maintain en-GB and slightly improve en-ZA since the South African guy no longer maintains his language.

I will fix it for en-ZA soon.

The other languages are maintained by Kevin Atkinson.

Please open a ticket in Kevin's GitHub: https://github.com/en-wl/wordlist

If by May Kevin doesn't fix it, I will change his files in my GitHub personally, since May and November are the releases for the next major version of LibreOffice.

Thanks!

Jamim commented 5 months ago

Hello @marcoagpinto,

I've figured out en-wl/wordlist is abandoned for several years now, so chances for any changes to be merged there are not very high. Also, I've found there is a related issue which is 8 years old:

From that issue I've learned that it would be better to add ’ rather than ' to WORDCHARS.

Since ’ is already in WORDCHARS for en_GB for a while, I believe adding it to other affix files is safe enough. And I don't think it's worth waiting any longer, so I hope this PR might be merged eventually.

Thanks!

marcoagpinto commented 5 months ago

Heya,

That is what I tell to everyone who complains about en-US: β€œWrite on Kevin's GitHub and good luck”.

I will add it tomorrow manually, I don't like to merge pull requests.

On 1-FEB, it will go live.

Thanks!

marcoagpinto commented 5 months ago

The task has been increasing since the dictionaries' maintainers are vanishing from the globe, it is even me who is fixing en-ZA.

marcoagpinto commented 5 months ago

It is released and fixed:

MAGP 2024-02-01

Updated the Dictionaries:
- British (Marco A.G.Pinto)
  * 181 new words
- US + CA + AU
  * Fix: apostrophe handling, by adding: WORDCHARS 0123456789’ to the .aff.
- ZA
  * Fix: Removed the: ICONV ’ ' because it was already at the end of the .aff;
    Fix: apostrophe handling, by adding: WORDCHARS 0123456789’ to the .aff;
    Improved flag J adding 424 words.