Ola Spellchecker
Reference for Word probability
http://www.katrinerk.com/courses/python-worksheets/language-models-in-python
- Need to match word case (DONE)
- Preserve spaces (DONE)
- Word not in corpus wont be spell corrected
- Handle multiple spelling mistakes in one sentence
Ther is nthing wrng
(DONE)
Questions
Following https://www.microsoft.com/cognitive-services/en-us/bing-spell-check-api
- To handle broken words (microso ft) => Use this https://github.com/grantjenks/wordsegment ?
- Slang => Synonyms
- Names => Our lookup store
- Homonyms => Supported
- Brands => Our lookup store
Usage
pip install -e git+ssh://git@github.com/rmdort/ola_spellchecker.git#egg=ola_spellchecker
Requirements
- Python
- NLTK
Usage
from ola_language_tools import SpellCheck
# Create an instance of the class
spellchecker = SpellCheck(corpus='spellcheck-corpus.txt')
print spellchecker.correct('Wher is everone?')
How it works
- Create a dictionary out of a large corpus of text
- Identify spell mistake for each word in the query
- Find the probability of co-occurence of each mis-spelled word correction
- Use the best probable match as replacement