Closed Chirag-v09 closed 2 years ago
@Chirag-v09 yes! This is totally doable. You should be able to follow my guidance from the readme under "apply a custom list of matches":
>>> s = "Department of medicine Colombia University closed on August 1 Milinda Samuelli"
>>> is_bad_rule = lambda rule: rule.message == 'Possible spelling mistake found.' and len(rule.replacements) and rule.replacements[0][0].isupper()
>>> import language_tool_python
>>> tool = language_tool_python.LanguageTool('en-US')
>>> matches = tool.check(s)
>>> # The following line could filter out the matches to solve your problem
>>> matches = [m for m in matches if is_good_rule(m)]
>>> matches = [rule for rule in matches if not is_bad_rule(rule)]
>>> language_tool_python.utils.correct(s, matches)
'Department of medicine Colombia University closed on August 1 Melinda Sam'
The previous code filters out matches based on some function is_good_rule
which only returns True if you want to apply that suggestion to the text. So you could implement is_good_rule
to return False if you're wrongly collecting those technical terms. Does that make sense?
Can you define the is_good_rule function? So that I can get more understanding of it.
It's a function you would write that takes in a rule and returns True if you want to apply it to the text and False otherwise. Here's an example that only accepts spelling mistakes:
>>> s = "Department of medicine Colombia University closed on August 1 Milinda Samuelli"
>>> is_good_rule = lambda rule: rule.message == 'Possible spelling mistake found.' and len(rule.replacements) and rule.replacements[0][0].isupper()
>>> import language_tool_python
>>> tool = language_tool_python.LanguageTool('en-US')
>>> matches = tool.check(s)
>>> matches = [rule for rule in matches if is_good_rule(rule)]
>>> language_tool_python.utils.correct(s, matches)
'Department of medicine Colombia University closed on August 1 Melinda Sam'
Hey, Thanks for the update but I need more clarification. For ex:
s = "Hello! Department of medicine Colombiya Universitii"
Here I know "Colombiya" and "Universitii" are wrong words. Still, I don't want spelling mistakes in "Colombiya" (this should be added to the dictionary or ignore spelling mistakes for this word). Still, I want spelling mistakes to come in "Universitii".
This is just an example @Chirag-v09. is_good_rule
is any function that takes a rule and returns true or false. So you just need to write a function that can express the filtering rule you want: which rules should be dropped, and which should be applied. If you want more help, you'll have to provide me more detail on your problem setup.
Is there a way to provide the language tool with a list of words that should NOT be marked as mistakes? I have a lot of technical terms in my data that are wrongly corrected when automatically applying the suggestions of the language tool.