Closed trans closed 6 years ago
SymSpell with maximum edit distance=2 and input="carpo" will return both "carpool" and "carport". But it will also return "carbon" (edit distance=2). And it would not return "carpooling" (edit distance=5).
For an autocomplete of a given prefix you should use a trie, also called digital tree, radix tree or prefix tree.
Hmm... ok, then I suppose I could write a function that takes "carpool", "carport" and "carbon" and extracts the sixth letter of each, if the prefix matches. That would basically give me what I want even if "carpooling" is not included. It wouldn't be perfect but probably good enough.
Yes, for straight up autocomplete, a binary search on an alphabetically sorted array is fastest (though not as memory efficient).
You should then skip early all candidates which don't share a common prefix with the input term - before they are added to the suggestion list. After line 436 you should add: if (!suggestion.StartsWith(input)) continue; That prevents unnecessary Levenshtein calculations (which are time consuming) for suggestions you would throw away later anyway.
A lookup in a Trie is faster than binary search: Trie lookup: O(L) , L is the average length of a word Binary search: O(log N), N is the number of words in the dictionary
Thank you. Nice tip!!!
Yes the lookup of the prefix in a trie is faster but then you have to traverse all the branches to collect the completions. With two binary searches on an ordered array it's already done -- all the completions are between the two indexes.
Does SymSpell have a way to get the next valid set of letters given a prefix? E.g. if I give it "carpo" it would return "o" and "r" for "carpool" and "carport" (maybe others).