inertia186 / cobblebot

Minecraft Server Automation
6 stars 1 forks source link

Slack Bot Special Characters #5

Open inertia186 opened 8 years ago

inertia186 commented 8 years ago

Specifically happening in SlackBot#realtime, messages received in data["text"] appear to contain non-ascii characters, for example, apostrophe. Example of real-time packet received:

{"type"=>"message", "channel"=>"REDACTED", "user"=>"REDACTED", "text"=>"@cb say test itΓÇÖs test", "ts"=>"1447625517.000851", "team"=>"REDACTED"}
Kagamul commented 8 years ago

Unfortunately I don't know enough about ruby to fix this issue, but maybe I can help you to find the solution: The ΓÇÖ are actually utf-8 encoded bytes for the right single quotation mark, but somehow the string seems to be interpreted as ascii only. There is a potential hacky-fix for this line on Stackoverflow, but as Steven is pointing out, the String should be read as utf-8 to begin with (I don't know where the data is comming from, lack of ruby knowledge, so I don't know how/where to do that).

Keep in mind, that Minecraft fully supports and uses utf-8 (even though the font it uses only partially supports it). There are restrictions in place, but in the worst case scenario even the username of an account could theoretically contain utf-8 characters, so it's probably a good idea to assume utf-8 encoding for all data comming out of minecraft itself and handle it appropriately.

best regards

inertia186 commented 8 years ago

I'm sure it's something simple. For messages going to Minecraft, I just force US-ASCII and in doing so, it usually passes the non-ascii along unchanged and errors if it can't deal with it. Probably not the right solution either.

Slack is open to suggestions, maybe they can avoid UTF-8 for such a common character. Or, maybe they can encode these characters into HTML entities.