Closed neomatrix369 closed 1 year ago
A simpler solution would be to revert back to the original logic:
score = 1 - (number_of_incorrect_words / number_of_correct_words)
and adjust the Words of Estimative Probability table to a stricter scoring:
["Very good", 99, 100],
["Quite good", 95, 99],
["Good", 90, 95],
["Pretty good", 85, 90],
["Bad", 60, 85],
["Pretty bad", 12, 60],
["Quite bad", 2, 12],
["Very bad", 0, 2]
We can tune this logic further with new input from users in the community. Eventually, this table could be made custom or can be passed as a parameter to assist in the scoring.
The new logic can be found in https://github.com/neomatrix369/nlp_profiler/blob/master/nlp_profiler/spelling_quality_check.py#L59 and the changes are as per the comment https://github.com/neomatrix369/nlp_profiler/issues/8#issuecomment-704932155.
May not be the best or the optimal fix, but it's a simple fix to start with.
Issue is partially fixed via #16.
TextBlob
, it does a decent job although the scores returned per misspelt word would then need to be correctly amortised across the whole text.Meaning, in a fair fashion evaluate on the whole how bad is the spelling in the text.
At the moment it's using the below logic:
Which can be improved as there are visible chances of false positive or false negative scores.
PS: performance of this feature is being addressed on #2, so this particular issue isn't about improving it's speed/performance. Performance issues may be addressed via other issues at a later stage. There has already been some significant performance improvements to the spell check and other aspects of NLP Profiler via #2.
Fix to #14 impacts, this issue, will need to also be fixed together.
~Replace the spellchecker with the package
pyspellchecker
(on PyPi) which appears to be closer to Peter Norvig's work.~ Replaced withSymspellpy
(https://pypi.org/project/symspellpy/)