Open Mukhammadsaid19 opened 2 years ago
Not an answer to your question, but possibly a working alternative: https://github.com/divvun/divvunspell. It is a Rust implementation of hfst-ospell
, and considerably faster.
The native Windows binaries are not a priority because WSL works so perfectly, so nobody tests on native Windows.
I would like to use it inside C# (for VSTO Microsoft Word Add-In) and I planned to bind it through DDL file. Initially I wanted to use voikko
but it has many features that differ from the Uzbek language, so I decided to start from scratch.
@snomos Interesting, I will check it out, thank you! @TinoDidriksen Anyone used hfst-ospell and its dependents for MS Office Add-Ins?
Anyone used hfst-ospell and its dependents for MS Office Add-Ins?
Yes, I do that. Divvun also does that. But it's not the correct way any longer. VSTO extensions are headed to the scrap heap because they can't run on macOS, iPad, or web editions. Instead we have moved to Office.js add-ins that work cross-platform:
What language are you trying to add a checker for?
Initially, I have tried to make Office.js add-in in Angular, but its API was a little restricted (I couldn't draw red lines using Windows Forms), so I decided to stick with C#. I remember that there was .js
web-assembly of voikko
. Hm... I will definitely check out these spellcheckers you suggested. Perhaps I am not on the right track.
What language are you trying to add a checker for?
The language I want to add is Uzbek, agglutinative language from Turkic family with 36 mln of speakers. It is similar to Turkish, but with simpler morphophonemics. I used foma
and hfst
to compile the morph analyzer, it recognizes around 99% of Uzbek words. In fact, there are many Turkic languages which don't have reliable spellcheckers: Kazakh, Kyrgyz, Turkmen, Uyghur etc.
P.S. Using hfst-ospell
I recently made a simple soft keyboard for Uzbek called Tahrirchi, I used hfst-ospell
and added a couple of algorithms to handle mobile input. However, it turned out to be much difficult task than I expected with its low-memory requirements and abundance of features offered by GBoard or Samsung keyboards (they also use FSTs, but in the context of HMMs). Have you happened to work with spellchecking in the soft keyboards?
We are painfully aware that Office.js is limited (and I've reported it upstream, twice), but it's still the only future-proof and cross-platform solution.
Divvun also makes keyboards. There's a whole pipeline for turning an FST into spellers, keyboards, and prediction, all for both desktop and mobile. @snomos can point you at docs. As for Uzbek, you may also be interested in https://github.com/apertium/apertium-uzb
Btw, we are on IRC on irc.oftc.net channels #hfst
and #apertium
https://github.com/divvun/kbdgen2 takes a (relatively) simple yaml file (wrapped in a bundle with some metadata) as input, and produces keyboard packages for iOS, Android, Linux, Windows, macOS and ChromeOS. For iOS and Android, the keyboards can be bundled with Hfst-based spellers. kbdgen2
is still work in progress. Other repos relevant to keyboards and spelling checkers are:
@TinoDidriksen I have checked out the Office apps you suggested, I think I would also go with that UI/UX. However, the coverage of Internet in Uzbekistan is poor, not everyone has constant access to it. So, I was thinking about creating a web-assembly version of hfst-ospell and integrating it into the extension and make it free and offline. Do you think it is ok?
@snomos Thank you! I will definitely check it out!
The nightly Windows builds got updated last week, so check if latest binary still throws.
Was facing the same issue. In my case, opening a file in "rb" mode instead of "r", solved the problem.
When I try to build
hfst-ospell
in Windows and use it with FST models, it throwsIndexTableReadingException
in functionvoid IndexTable::read(FILE *f, TransitionTableIndex number_of_table_entries)
:The same bug persists when using pre-compiled executable listed in the Apertium website:
Lexicon and error model files work well on Linux Ubuntu. What might be a problem?