wolfgarbe / SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
https://seekstorm.com/blog/1000x-spelling-correction/
MIT License
3.12k stars 284 forks source link

[Question] How to adhoc setup an own dictionary #95

Closed CobraCalle closed 4 years ago

CobraCalle commented 4 years ago

Hello,

I`m looking for a way to find entries out of a list from speach recognition result. For example I have a list of radio stations and the use would like to select one by voice... often the recognised speach, does not exactly matches the radio station name (for example "RaumeMusik.fm lounge" and "raute music lounge"... now I would like to find the closest defined radio station...

How can I setup my own "dictionary" with the list of defined radio stations to use symspell to find the closest?

thank you very much

wolfgarbe commented 4 years ago

//create object
int initialCapacity = 100;
int maxEditDistanceDictionary = 3; //maximum edit distance per dictionary precalculation
var symSpell = new SymSpell(initialCapacity, maxEditDistanceDictionary);

//load dictionary
symSpell.LoadDictionary("station_names.txt", 0, 1,new char[] { ','})

//lookup suggestions for an input string
string inputTerm="raute music lounge";
int maxEditDistanceLookup = 3; //max edit distance per lookup (maxEditDistanceLookup<=maxEditDistanceDictionary)
var suggestionVerbosity = SymSpell.Verbosity.Closest; //Top, Closest, All
var suggestions = symSpell.Lookup(inputTerm, suggestionVerbosity, maxEditDistanceLookup);

//display suggestions, edit distance and term frequency
foreach (var suggestion in suggestions)
{ 
  Console.WriteLine(suggestion.term +" "+ suggestion.distance.ToString() +" "+ suggestion.count.ToString("N0"));
}

station_names.txt (all dictionary entries in lower case, terms may contain white space, the word frequency in the dictionary is not important for your use case, but can be used to rank/prioritize suggestions, if their edit distance is equal)

raumemusik.fm lounge,10
energy berlin,10
star sat radio,10
antenne bayern,10
CobraCalle commented 4 years ago

Thank you very much for your help...

I tried to adapt your sample (because I want to create the dictionary adhoc and would like a avoid creating / reading a file)...

        var symSpell = new SymSpell(100, 3);

        using (var dictionaryStream = new System.IO.MemoryStream())
        using (var dictionaryStreamWriter = new System.IO.StreamWriter(dictionaryStream))
        {
            dictionaryStreamWriter.WriteLine("raumemusik.fm lounge,10");
            dictionaryStreamWriter.WriteLine("energy berlin,10");
            dictionaryStreamWriter.WriteLine("star sat radio,10");
            dictionaryStreamWriter.WriteLine("antenne bayern,10");

            dictionaryStreamWriter.Flush();

            dictionaryStream.Position = 0;

            symSpell.LoadDictionary(dictionaryStream, 0, 1);
        }

        string inputTerm = "raute music lounge";
        int maxEditDistanceLookup = 3; //max edit distance per lookup (maxEditDistanceLookup<=maxEditDistanceDictionary)
        var suggestionVerbosity = SymSpell.Verbosity.Closest; //Top, Closest, All
        var suggestions = symSpell.Lookup(inputTerm, suggestionVerbosity, maxEditDistanceLookup);

        //display suggestions, edit distance and term frequency
        foreach (var suggestion in suggestions)
        {
            Console.WriteLine(suggestion.term + " " + suggestion.distance.ToString() + " " + suggestion.count.ToString("N0"));
        }

But the result is an empty array of suggestions.

Ive noticed that the CreateDictionary-method does not have a char-array-parameter (Im using the latest nuget)... could that be to problem?

Loading the dictionary (as in your sample) from a file doesn´t work too

CobraCalle commented 4 years ago

OK... I found the problem... maxDicEditDistance = 3 is not enough for the sample... 6 will work