Closed kapilg1997 closed 4 years ago
Hi @kapilg1997, that's a great question. I don't think you can do this directly through the LanguageTool API (here's a link to the API documentation). LanguageTool has an internal dictionary for each of its languages which cannot be changed.
However, you could implement this yourself. Basically, get the list of corrections (Match
objects) from LanguageTool, and filter out the ones that are error corrections for words in your vocabulary.
vocab = {'Milinda', 'Samuelli'} # add words to your vocab
s = "Department of medicine Colombia University clossed on August 1 Milinda Samuelli."
is_correctly_spelled = lambda rule: rule.ruleIssueType == 'misspelling' and rule.matchedText in vocab
import language_tool_python
tool = language_tool_python.LanguageTool('en-US')
matches = tool.check(s)
matches = [rule for rule in matches if not is_correctly_spelled(rule)]
language_tool_python.utils.correct(s, matches)
'Department of medicine Colombia University closed on August 1 Milinda Samuelli.'
This way, you won't apply error corrections for words that are in your dictionary (vocab
). Let me know if this works. If you want to add this as a feature via pull request I'd be interested as well.
Sorry, this is not working
vocab = {'Milinda', 'Samuelli'} s = "Department of medicine Colombia University clossed on August 1 Milinda Samuelli." is_correctly_spelled = lambda rule: rule.ruleIssueType == 'misspelling' and rule.matchedText in vocab import language_tool_python tool = language_tool_python.LanguageTool('en-US') Downloading LanguageTool: 100%|█████████████████████████████████████████████████████| 190M/190M [00:54<00:00, 3.51MB/s] Unzipping C:\Users\Admin\AppData\Local\Temp\tmpakz_f2l4.zip to C:\Users\Admin/.cache/language_tool_python/. Downloaded https://www.languagetool.org/download/LanguageTool-5.0.zip to C:\Users\Admin/.cache/language_tool_python/. self._url: http://127.0.0.1:8081/v2/ matches = tool.check(s) matches = [rule for rule in matches if not is_correctly_spelled(rule)] Traceback (most recent call last): File "
", line 1, in File " ", line 1, in File " ", line 1, in File "C:\ProgramData\Anaconda3\lib\site-packages\language_tool_python\match.py", line 107, in getattr .format(self.class.name, name)) AttributeError: 'Match' object has no attribute 'matchedText'
@kapilg1997 It seems your version is outdated. Can you try pip install language_tool_python --upgrade
please?
@kapilg1997 It seems your version is outdated. Can you try
pip install language_tool_python --upgrade
please?
Works now thanks!!
Hi @jxmorris12, sorry to bump this thread but I was looking for exactly this.
I tried with the newSpellings
and new_spellings_persist=True
arguments, which call _register_spellings
but it wasn't working.
You're solution is perfect and I guess it could be implemented because the logic is really easy.
Moreover, adding a fuzzy matcher like the textdistance
package would allow people to get predictions with their new vocabulary.
Maybe I will do a PR if I find time....
Thanks a lot for the nice wrapper around this tool.
Have a great day !
@Ierezell -- a pull request would be wonderful! Please consider it!
Hi I am new to NLP in general. I want to implement this tool as a spell checker. How to add new words in the existing vocabulary of this tool?