wolfgarbe / SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
https://seekstorm.com/blog/1000x-spelling-correction/
MIT License
3.15k stars 298 forks source link

Predicts garbage for Bengali input #119

Open hafiz031 opened 2 years ago

hafiz031 commented 2 years ago

I am trying this lookup_compound | Keep original casing example on a Bengali corpus of unigrams and bigrams. As a separator I have used comma. But it seems to be not working. For any misspelled input it is just outputting garbage string. This issue happened on this python implementation of this package.

wolfgarbe commented 2 years ago

In order to look into the issue I would need the following information:

  1. all SymSpell parameters used: prefixLength, maxEditDistanceDictionary , maxEditDistanceLookup , suggestionVerbosity
  2. Bengali unigrams and bigram frequency dictionary
  3. some Bengali examples: input text, current output text, expected output text
hafiz031 commented 2 years ago

@wolfgarbe I mistakenly posted this issue here. Actually this issue was found in one of the Python implementations of this package. I re-posted the issue there later. Here is the link: https://github.com/mammothb/symspellpy/issues/110. Here you will also find the unigram and the bigram frequency dictionaries from my comment.