mnater / Hyphenopoly

Hyphenation for node and Polyfill for client-side hyphenation.
http://mnater.github.io/Hyphenopoly/
MIT License
689 stars 45 forks source link

Add language? #202

Closed clauseggers closed 1 year ago

clauseggers commented 1 year ago

How can I add a language to your system? Case in point Faroese. I have a Hunspell hyphenation file, but I don’t know if there is any path from that to your wasm?

mnater commented 1 year ago

I'm using this script (https://github.com/mnater/Hyphenopoly/blob/master/tools/createWasmForLang.sh).

It takes four files as input:

Take the following files for german as examples (de.zip).

The script converts this input to a binary representation and adds it as a data-element to the .wasm code that's compiled with assemblyscript. You may need to adapt the paths.

Be aware: There's an issue with assemblyscript (respectively binaryen) that inhibits the addition of pattern-data that is larger than 64kb (https://github.com/WebAssembly/binaryen/issues/5595). Until this is sorted out you'll need to use assemblyscript <=0.26.

If the lic allows it, you may also just send the file and I'll do my best to convert and publish it.

clauseggers commented 1 year ago

Thank you Mathias. I have prepared the files following your template. The only deviations is that the hyph-fo.pat.txt file contain a UTF-8 in the first line. See if you can make this compile, that would be awesome. fo.zip

mnater commented 1 year ago

Thank you.

Where did you get this from? I haven't found it in the hunspell git-repo, but I like to check if there are other languages available...

clauseggers commented 1 year ago

I collected a number of Hunspell hyphenation files for less supported languages, and Faroese was one of them. This is what I wrote in the description of where I got it:

Language: Faroese (Faroe Islands) (fo FO)
Origin:   Generated from a collection of hyphenated words provided by the newspaper Dimmalætting.
          http://fo.speling.org/filer/hyph_fo_FO-20040420a.zip (Site no longer online, see below instead)
          https://fedora.pkgs.org/37/fedora-aarch64/hyphen-fo-0.20040420-22.fc37.noarch.rpm.html
License:  GNU General Public License, version 2
Author:   Jacob Sparre Andersen <jacob@flug.fo>

Faroese dictionary for spell checking.

I’m pretty sure it was just a Google search that pointed me to them, and that I either downloaded the files from the Fedora repo, or Archive.org.