marcoagpinto / aoo-mozilla-en-dict

English Dictionaries Project (AOO+Mozilla+others)
163 stars 24 forks source link

GB dic file has common words missing when using Vale #73

Closed alecthegeek closed 8 months ago

alecthegeek commented 8 months ago

I have downloaded the en_GB dic and aff files from https://github.com/LibreOffice/dictionaries.

I then use Vale to run a spellcheck and some common words are flagged as incorrect (and don't appear to be in the dic file).

 25:23   error  'recording' is not British English!  ProjectStyle.spelling
 54:7    error  'into' is not British English!       ProjectStyle.spelling
 65:58   error  'releases' is not British English!   ProjectStyle.spelling
 143:62  error  'detail' is not British English!     ProjectStyle.spelling

These words do exist in the wordlist.txt file, so I'm not sure why they are not in the dic file.

Do I need to some something extra other than copy the two files into the correct location?

For comparison, when I install the GB dictionary from https://github.com/wooorm/dictionaries/tree/main/dictionaries/en-GB these words are not flagged (but those files are four years old).

Thanks for the great work.

marcoagpinto commented 8 months ago

Heya,

The reason you can't find them in the .dic is because they are using prefixes.

That is the reason why in the future I want to add a feature to Proofing Tool GUI that will extract all prefixes and only keep suffixes.

Screenshot 2024-01-16 054243

alecthegeek commented 8 months ago

Thanks for the answer @marcoagpinto

Does this mean I cannot use these dictionaries in a tool like Vale? I'm afraid I don't understand the technical aspects of how these work.

marcoagpinto commented 8 months ago

The .dic + .aff only work in software that supports Hunspell.

alecthegeek commented 8 months ago

Vale does use Hunspell. For example the en_AU files work fine.

Is there some processing I can do on the en_GB files to fix the prefixes issue?

Thanks

marcoagpinto commented 8 months ago

Heya again,

There are no prefix issues, they work just fine.

The reason I want to extract all prefixes and convert the resulting words to suffixes is because prefixed words are difficult to find while looking at the .dic and also create tons of duplicates.

But it may take several months before Proofing Tool GUI will do that task as I need to rewrite its core engine, I am not sure if it will be ready before 2025 (too much going on right now).

alecthegeek commented 8 months ago

Thanks for the follow up.

I think I will go with the Hunspell files from http://wordlist.aspell.net/dicts/ for the time. Not as up to date as yours though.

Closing this ticket.

alecthegeek commented 8 months ago

In case this helps anyone else, upgrading to Vale 3.0.5 fixed the problem.

sorry for the noise. I will edit the topic to show this is related to vale