universal-automata / liblevenshtein

Various utilities regarding Levenshtein transducers.
https://github.com/universal-automata/liblevenshtein
MIT License
67 stars 13 forks source link

Trie compression via double-array tries #29

Open dylon opened 10 years ago

dylon commented 10 years ago

Here's one implementation: http://linux.thai.net/~thep/datrie/

For liblevenshtein, implement the algorithm described in, "Compression of double array structures for fixed length keywords", by Masao Fuketa, Hiroya Kitagawa, Takuki Ogawa, Kazuhiro Morita, and Jun-ichi Aoe. Have several implementations: one which uses a dense array as described by the paper and another that uses jagged arrays as a sparse matrix. Extend the algorithms in the paper to return the outgoing labels from the current state.

Some Papers:

  1. "Space-efficient Static Trees and Graphs", by Guy Jacobson (maybe: it has a high compression rate but slow query time)
  2. "An Efficient Implementation of Trie Structures", by Jun-Ichi Aoe, Katsushi Morimoto, and Takashi Sato
  3. "An efficient deletion method for a minimal prefix double array", by Susumu Yata, Masaki Oono, Kazuhiro Morita, Masao Fuketa, and Jun-ichi Aoe
  4. "A compact static double-array keeping character codes", by Susumu Yata, Masaki Oono, Kazuhiro Morita, Masao Fuketa, Toru Sumitomo, and Jun-ichi Aoe
  5. "Compression of double array structures for fixed length keywords", by Masao Fuketa, Hiroya Kitagawa, Takuki Ogawa, Kazuhiro Morita, and Jun-ichi Aoe
  6. "New methods for compression of MP double array by compact management of suffixes", by Tshering C. Dorji, El-sayed Atlam, Susumu Yata, Mahmoud Rokaya, Masao Fuketa, Kazuhiro Morita, and Jun-ichi Aoe
  7. "B-tries for disk-based string management", by Nikolas Askitis and Justin Zobel (disk-based storage of larger-than-memory tries)
lygstate commented 8 years ago

Ping for A new compression method for double-array structures by a hierarchical representation http://www.inderscienceonline.com/doi/abs/10.1504/IJISTA.2015.074331

dylon commented 8 years ago

Nice! I'll check it out.