wolfgarbe / SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
https://seekstorm.com/blog/1000x-spelling-correction/
MIT License
3.12k stars 284 forks source link

Trie in Symspell #92

Closed MighTguY closed 4 years ago

MighTguY commented 4 years ago

Hi @wolfgarbe , Greetings Currently, we use Dictionary/Map for holding data? can we move it to a more optimized trie?

similar to your repo PruningRadixTrie (https://github.com/wolfgarbe/PruningRadixTrie)

wolfgarbe commented 4 years ago

Of course we could use a Trie instead of a Dictionary/Map for holding the deletes. But I don't think there will be a benefit, neither in lookup speed nor in memory consumption. But I will do a quick experiment to verify that.

wolfgarbe commented 4 years ago

Test results: 500.000 terms from frequency_dictionary_en_500_000.txt resulting in 12.577.734 deletes:

Radix trie<string,long>: 1800 MB
Dictionary<string,long>: 1366 MB
Dictionary<uint hash, long>: 600 MB //instead of key=term I use key=FNV-1a-Hash(term)

The dictionary is more space efficient than the trie. For auto-complete I use a trie because it allows prefix range search. But for SymSpell spelling correction I just need a simple key lookup.

There are ways to optimize the space efficiency of tries, e.g. Succinct Tries . But those are slow.