mnater / Hyphenopoly

Hyphenation for node and Polyfill for client-side hyphenation.
http://mnater.github.io/Hyphenopoly/
MIT License
689 stars 45 forks source link

Succinct data structure #167

Closed mnater closed 3 years ago

mnater commented 3 years ago

By chance I stumbled across a very interesting technology: "succinct tries".

The basic functionality is pretty well explained at http://stevehanov.ca/blog/?id=120. For Hyphenopoly, however, the data structure still needs to be extended by the possibility to store the separating values with each of the nodes.

All in all, such a data structure would offer some advantages:

However, I'm not sure yet if the performance remains about the same when actually hyphenating a word. Here I fear some degradation - this largely depends on how performant the rank and select functions can be implemented in webassembly.

Let's see...

mnater commented 3 years ago

https://github.com/mnater/Hyphenopoly/tree/i167

mnater commented 3 years ago

Interim report (what I have found out so far):

It is possible to implement a succinct trie in Assemblyscript and it works.

Large pattern files get about 20% smaller, while small pattern files get slightly larger. This is probably due to the larger amount of code. If the .wasm files are compressed, however, the size difference is not worth mentioning.

In the current implementation, the wasm instances require much less memory. This could be an advantage on devices with less memory.

Hyphenation using succinct trie currently still takes about twice as long as with the conventional implementation. The bottleneck is the select function. It must be called several hundred times for a single word, which takes time, even though it is very fast on its own. This could be countered with a cache, but this in turn requires more memory and thus nullifies the above-mentioned advantage.

Or I could find a way to reduce the number of select calls...

mnater commented 3 years ago

succinct data structure ready to merge. Closing this in favor of #171