hfst / hfst-ospell

HFST spell checker library and command line tool
Apache License 2.0
13 stars 9 forks source link

Using on Windows: `IndexTableReadingException` #56

Open Mukhammadsaid19 opened 2 years ago

Mukhammadsaid19 commented 2 years ago

When I try to build hfst-ospell in Windows and use it with FST models, it throws IndexTableReadingException in function void IndexTable::read(FILE *f, TransitionTableIndex number_of_table_entries):

image

The same bug persists when using pre-compiled executable listed in the Apertium website:

image

Lexicon and error model files work well on Linux Ubuntu. What might be a problem?

snomos commented 2 years ago

Not an answer to your question, but possibly a working alternative: https://github.com/divvun/divvunspell. It is a Rust implementation of hfst-ospell, and considerably faster.

TinoDidriksen commented 2 years ago

The native Windows binaries are not a priority because WSL works so perfectly, so nobody tests on native Windows.

Mukhammadsaid19 commented 2 years ago

I would like to use it inside C# (for VSTO Microsoft Word Add-In) and I planned to bind it through DDL file. Initially I wanted to use voikko but it has many features that differ from the Uzbek language, so I decided to start from scratch.

@snomos Interesting, I will check it out, thank you! @TinoDidriksen Anyone used hfst-ospell and its dependents for MS Office Add-Ins?

TinoDidriksen commented 2 years ago

Anyone used hfst-ospell and its dependents for MS Office Add-Ins?

Yes, I do that. Divvun also does that. But it's not the correct way any longer. VSTO extensions are headed to the scrap heap because they can't run on macOS, iPad, or web editions. Instead we have moved to Office.js add-ins that work cross-platform:

What language are you trying to add a checker for?

Mukhammadsaid19 commented 2 years ago

Initially, I have tried to make Office.js add-in in Angular, but its API was a little restricted (I couldn't draw red lines using Windows Forms), so I decided to stick with C#. I remember that there was .js web-assembly of voikko. Hm... I will definitely check out these spellcheckers you suggested. Perhaps I am not on the right track.

What language are you trying to add a checker for?

The language I want to add is Uzbek, agglutinative language from Turkic family with 36 mln of speakers. It is similar to Turkish, but with simpler morphophonemics. I used foma and hfst to compile the morph analyzer, it recognizes around 99% of Uzbek words. In fact, there are many Turkic languages which don't have reliable spellcheckers: Kazakh, Kyrgyz, Turkmen, Uyghur etc.

P.S. Using hfst-ospell I recently made a simple soft keyboard for Uzbek called Tahrirchi, I used hfst-ospell and added a couple of algorithms to handle mobile input. However, it turned out to be much difficult task than I expected with its low-memory requirements and abundance of features offered by GBoard or Samsung keyboards (they also use FSTs, but in the context of HMMs). Have you happened to work with spellchecking in the soft keyboards?

TinoDidriksen commented 2 years ago

We are painfully aware that Office.js is limited (and I've reported it upstream, twice), but it's still the only future-proof and cross-platform solution.

Divvun also makes keyboards. There's a whole pipeline for turning an FST into spellers, keyboards, and prediction, all for both desktop and mobile. @snomos can point you at docs. As for Uzbek, you may also be interested in https://github.com/apertium/apertium-uzb

Btw, we are on IRC on irc.oftc.net channels #hfst and #apertium

snomos commented 2 years ago

https://github.com/divvun/kbdgen2 takes a (relatively) simple yaml file (wrapped in a bundle with some metadata) as input, and produces keyboard packages for iOS, Android, Linux, Windows, macOS and ChromeOS. For iOS and Android, the keyboards can be bundled with Hfst-based spellers. kbdgen2 is still work in progress. Other repos relevant to keyboards and spelling checkers are:

Mukhammadsaid19 commented 2 years ago

@TinoDidriksen I have checked out the Office apps you suggested, I think I would also go with that UI/UX. However, the coverage of Internet in Uzbekistan is poor, not everyone has constant access to it. So, I was thinking about creating a web-assembly version of hfst-ospell and integrating it into the extension and make it free and offline. Do you think it is ok?

@snomos Thank you! I will definitely check it out!

TinoDidriksen commented 2 years ago

The nightly Windows builds got updated last week, so check if latest binary still throws.

jzr-supove commented 1 year ago

Was facing the same issue. In my case, opening a file in "rb" mode instead of "r", solved the problem.