scrapinghub / slackbot

A chat bot for Slack (https://slack.com).
MIT License
1.26k stars 396 forks source link

Run all text through unicode normalization with form NFKD #222

Closed imduffy15 closed 2 years ago

imduffy15 commented 3 years ago

In some slack messages non breaking space u"\xa0" is used instead of space . This becomes very annoying when matching on regexes.

I believe it would be easier to just normalize the messages as soon as possible to remove the problem.

lins05 commented 3 years ago

Hey Ian, is there any official document reference for this kind of behavior?

imduffy15 commented 3 years ago

Hi @lins05 none that I could find I'm afraid, the best I could get is a few mentions on other github projects.

https://github.com/slack-ruby/slack-ruby-client/issues/319

https://github.com/slack-ruby/slack-ruby-bot/pull/256

tehranian commented 3 years ago

+1 I believe we had to write custom code to do substitutions of unicode characters.

Ex: When a user copy & pastes a command that was sent to the bot, Slack (Electron?) will prefix with a unicode whitespace. To the user, this is visually invisible, but it makes the slackbot not respond to the copied command.

imduffy15 commented 2 years ago

No longer in a position to benefit from this change so closing it. Its still valid/needed.