Closed neomatrix369 closed 3 years ago
Added an Emoji Decoder
Added an Emoji Decoder
If you look at the library it already has one, what does you Emoji Decode do? Is it part of the current PR you are working on?
The problem detected isn't connected to Emojis, it was an example instance, other types of sentences had this issue too.
Fixed by #17 hence closing
During the presentation, it was observed that sentences with emojis could end up getting an incorrect sentence count. And this could be due to the punctuation(s) that build up emojis.
Note: there may be other edge-cases involving the (.) sign which is the primary indicator of the end of a sentence in English (and many such Latin and Germanic languages).
[ ] ~Quick solution~
~Look for emojis in the text, drop them. Then performance sentence counts on the text~
~Cache the respective functions so results can be reused.~
[X] Better/Robust solution
Use a library or existing algorithm that handle edge-cases better. Maybe
nltk
orspacey
could help in this case.Cache the respective functions so results can be reused.
[X] Verify/validate
add edge case tests for sentence count
[X] Apply fix to the dependent functionality
Spell check. used it and now it does not help with new less accurate scores, see issue #8
Related
After checking multiple examples it seems many sentences were getting incorrect sentence count.