wolfgarbe / SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
https://seekstorm.com/blog/1000x-spelling-correction/
MIT License
3.13k stars 286 forks source link

Possible optimization? #140

Open gdetari opened 16 hours ago

gdetari commented 16 hours ago

In the following function:

private HashSet<string> Edits(string word, int editDistance, HashSet<string> deleteWords)
    {
        editDistance++;
        if (word.Length > 1)
        {
            for (int i = 0; i < word.Length; i++)
            {
                string delete = word.Remove(i, 1);
                if (deleteWords.Add(delete))
                {
                    //recursion, if maximum edit distance not yet reached
                    if (editDistance < maxDictionaryEditDistance) Edits(delete, editDistance, deleteWords);
                }
            }
        }
        return deleteWords;
    }

Would it make sense to only add to calculate and add the words to deleteWords if editDistance < maxDictionaryEditDistance is true? That is quite a difference in loading dictionary performance. It does change the functionality, but isn't that the correct approach not to add the words anyhow?

wolfgarbe commented 11 hours ago

I think we already do exactly this. Edits is a recursive method. Only in the first iteration we unconditionally calculate and add deletes. That is because we always have maxDictionaryEditDistance>=1. But the second and following iterations of Edits are done only if editDistance < maxDictionaryEditDistance.