jxmorris12 / language_tool_python

a free python grammar checker 📝✅
GNU General Public License v3.0
434 stars 64 forks source link

How to add new words to the vocabulary of this tool? #21

Closed kapilg1997 closed 4 years ago

kapilg1997 commented 4 years ago

Hi I am new to NLP in general. I want to implement this tool as a spell checker. How to add new words in the existing vocabulary of this tool?

jxmorris12 commented 4 years ago

Hi @kapilg1997, that's a great question. I don't think you can do this directly through the LanguageTool API (here's a link to the API documentation). LanguageTool has an internal dictionary for each of its languages which cannot be changed.

However, you could implement this yourself. Basically, get the list of corrections (Match objects) from LanguageTool, and filter out the ones that are error corrections for words in your vocabulary.

vocab = {'Milinda', 'Samuelli'} # add words to your vocab
s = "Department of medicine Colombia University clossed on August 1 Milinda Samuelli."
is_correctly_spelled = lambda rule: rule.ruleIssueType == 'misspelling' and rule.matchedText in vocab
import language_tool_python
tool = language_tool_python.LanguageTool('en-US')
matches = tool.check(s)
matches = [rule for rule in matches if not is_correctly_spelled(rule)]
language_tool_python.utils.correct(s, matches)
'Department of medicine Colombia University closed on August 1 Milinda Samuelli.'

This way, you won't apply error corrections for words that are in your dictionary (vocab). Let me know if this works. If you want to add this as a feature via pull request I'd be interested as well.

kapilg1997 commented 4 years ago

Sorry, this is not working

vocab = {'Milinda', 'Samuelli'} s = "Department of medicine Colombia University clossed on August 1 Milinda Samuelli." is_correctly_spelled = lambda rule: rule.ruleIssueType == 'misspelling' and rule.matchedText in vocab import language_tool_python tool = language_tool_python.LanguageTool('en-US') Downloading LanguageTool: 100%|█████████████████████████████████████████████████████| 190M/190M [00:54<00:00, 3.51MB/s] Unzipping C:\Users\Admin\AppData\Local\Temp\tmpakz_f2l4.zip to C:\Users\Admin/.cache/language_tool_python/. Downloaded https://www.languagetool.org/download/LanguageTool-5.0.zip to C:\Users\Admin/.cache/language_tool_python/. self._url: http://127.0.0.1:8081/v2/ matches = tool.check(s) matches = [rule for rule in matches if not is_correctly_spelled(rule)] Traceback (most recent call last): File "", line 1, in File "", line 1, in File "", line 1, in File "C:\ProgramData\Anaconda3\lib\site-packages\language_tool_python\match.py", line 107, in getattr .format(self.class.name, name)) AttributeError: 'Match' object has no attribute 'matchedText'

jxmorris12 commented 4 years ago

@kapilg1997 It seems your version is outdated. Can you try pip install language_tool_python --upgrade please?

kapilg1997 commented 4 years ago

@kapilg1997 It seems your version is outdated. Can you try pip install language_tool_python --upgrade please?

Works now thanks!!

ierezell commented 3 years ago

Hi @jxmorris12, sorry to bump this thread but I was looking for exactly this.

I tried with the newSpellings and new_spellings_persist=True arguments, which call _register_spellings but it wasn't working.

You're solution is perfect and I guess it could be implemented because the logic is really easy. Moreover, adding a fuzzy matcher like the textdistance package would allow people to get predictions with their new vocabulary.

Maybe I will do a PR if I find time....

Thanks a lot for the nice wrapper around this tool.

Have a great day !

jxmorris12 commented 3 years ago

@Ierezell -- a pull request would be wonderful! Please consider it!