Closed Amir-Eskandari closed 6 years ago
You can simply read your dictionary/dataset like this: int initialCapacity = 82765; int maxEditDistanceDictionary = 2; var symSpell = new SymSpell(initialCapacity, maxEditDistanceDictionary); int termIndex = 0; //column of the term in the dictionary text file int countIndex = 1; //column of the term frequency in the dictionary text file symSpell.LoadDictionary(dictionaryPath, termIndex, countIndex)
The culture column is simply ignored by SymSpell The word "car" appears in two lines, but it is combined to a single entry in the internal dictionary. The two values in the count column (2000 and 1000) are added to count=3000.
Hello,
firstly thank you for fast response. secondly, If I want to create my on LoadDictionary(), I need to have access to some fileds like deletes in order to check it before CommitStaged() then can I make a public property named Deletes and push it on the master branch then I can use your nuget in my web site.
by the way there are some properties like EntryCount and WordCount which they don't check if the deletes or words are null or not, I can fix them too.
Best, Amir
I'm not sure what you are trying to achieve, but in order to support both space separated words and custom columns you could just change the dictionary format from space separated columns to comma separated columns: mercedes benz,500,en-us
273: string[] lineParts = line.Split(null);
273: string[] lineParts = line.Split(',');
Alternatively you could put space separated words inside quotation marks and adapt the parsing. No access to deletes is required.
Of course you can make whatever changes you want in your own fork, but the behaviour and structure of internal fields like deletes can change in the future. That's the reason they are not public, in order to prevent breaking changes of the library. LoadDictionary will support space separated words in the future.
Wolf
I didn't want to change it in my own fork in order to use the update of your NuGet package, but seems like I have to do it to make it compatible with my needs.
thank you, Amir
Hi,
1 - I want to add more columns like 'category' or 'type' or 'Culture' in the dataset and in that case maybe i need to have a word twice in the dataset. for adding more clolumns which you mentioned it's possible, should I change the LoadDictionary method to support more than 2 columns ?
2 - what can I do for space seperated words, something like Mercedes benz ?
Best, Amir