silnrsi / oxttools

Tools for creating language support oxt extensions for LibreOffice
MIT License
6 stars 6 forks source link

Normalisation is inappropriate #6

Open Richard57 opened 5 years ago

Richard57 commented 5 years ago

Normalising the text of the Hunspell dictionary and affix files is inappropriate.

  1. Libreoffice does not normalise text on input, and, unlike say Wikipedia, does not normalise it upon saving. As typing text unnormalised may be the most natural method, it makes sense for a spell-checker to use ICONV in the affix file to bring words to a canonical form. This canonicalisation would be destroyed by normalising the affix file.
  2. Some morphological alternations may most simply be handled by using non-NFC forms in the lexicon.

This normalisation is performed by function zipnfcfile() in script makeoxt.

n8marti commented 1 year ago

I concur with this assertion. The target language may NFC or NFD characters, and makeoxt should be agnostic and hands-off about this. In my case the NFC normalization breaks LO's ability to correctly identify words that use NFD characters because my AFF file does use ICONV, as @Richard57 suggests, and my DIC files uses NFD characters.

DavidLRowe commented 1 year ago

@n8marti I'm currently looking at this. I'd love to have some simple test data, say your AFF file and a DIC file with six words that include NFD characters. I can then make a test file using the words from the DIC file.

n8marti commented 1 year ago

sg-CF.aff.txt sg-CF.dic.txt

I had to rename the files b/c github doesn't like the non-txt extensions. I've made some other changes to these files since I last built my OXT extension, but I think they will still exhibit the problem if you build it with makeoxt.

DavidLRowe commented 1 year ago

Commit 461379c attempts to address this issue

In addition, some changes were made to the documentation. I hope it's okay to have included sg-CF in an example. Thanks, @n8marti, for the sample files.)

I have not yet built the Windows executable, but this should work on Linux. Any feedback welcomed.

DavidLRowe commented 1 year ago

makeoxt.exe available in zip file at https://github.com/silnrsi/oxttools/releases/tag/v0.6

n8marti commented 1 year ago

Great. This (linux version) works for me now, thanks.