nminaya / grammar-nazi-bot

Bot that corrects spelling mistakes.
MIT License
31 stars 8 forks source link

Exclude emojis from being checked #117

Closed Daksh777 closed 3 years ago

Daksh777 commented 3 years ago

image

nminaya commented 3 years ago

@Daksh777 Thanks for reporting the issue

nminaya commented 3 years ago

I was able to reproduce the issue, only happens with YandexSpellerAPI. We currently exclude emojis in the following line:

https://github.com/nminaya/grammar-nazi-bot/blob/1058b5a991e251aac63393bfe06e355bdaadca3d/GrammarNazi.App/HostedServices/TelegramBotHostedService.cs#L126-L127

But something interesting is happening.

  1. GetCleanedText("bored") returns "bored".
  2. GetCleanedText("bored🔫😡😆😡❤️😡😡❤️") returns "bored".

They're apparently the same, but when this text is parsed with HttpUtility.UrlEncode in https://github.com/nminaya/grammar-nazi-bot/blob/1058b5a991e251aac63393bfe06e355bdaadca3d/GrammarNazi.Core/Clients/YandexSpellerApiClient.cs#L30

HttpUtility.UrlEncode("bored") is returning "bored%ef%b8%8f%ef%b8%8f", and this is causing the API to return a correction for this word.

I need to verify how to clean the text of these invisible characters in GetCleanedText or StringUtils.RemoveEmojis method.