nortxort / pinylib

A rewrite of tinylib
MIT License
6 stars 3 forks source link

Let's log some unicode #32

Closed ghost closed 8 years ago

ghost commented 8 years ago

https://github.com/nortxort/pinylib/blob/master/pinylib.py#L63

fh.file_writer(path, file_name, msg.encode('ascii', 'ignore')) Currently any unicode is just left out.

Let's log that too aye? fh.file_writer(path, file_name, msg.encode(encoding='UTF-8',errors='ignore') Confirmed working under Linux, should be fine for Windows as well.


[aida@programmerfag pinybot-master]$ cat files/logs/awkwardlysocial/logs/2016-08-24_awkwardlysocial.log | grep aida
[14:45:24] [joins] guest-54301:54301 changed nick to: aida
[14:45:31] enzo: sup aida
### Before UTF-8 encoding
[14:45:37] [users] aida: !syncuser aida
[14:45:39] [users] aida: !yt DN "bonnie & clyde
[14:57:32] [joins] Joins Moderator aida:54301:ccpd
### After UTF-8 encoding
[14:57:56] aida: DΞΔN
[14:57:57] [users] aida: !yt DΞΔN
[14:57:59] [users] aida: !stop 
[15:07:14] aida: ¥ ¼  Ñ  ñ
[15:07:18] aida: Ą ą  IJ  ij
[15:07:22] aida: Ə Ɛ  ƕ  ƺ
[15:07:28] aida: ɖ ɞ  ɫ  ɷ
[15:07:33] aida: ʱ ʬ  ˕  ˨
[15:07:37] aida: o̕    o̚ ơ o͡o
[15:07:41] aida: Ύ Δ  δ  Ϡ
[15:07:47] aida: Љ Щ  щ  Ӄ
[15:07:53] aida: ؟ ب  حٍ    ۳
[15:07:57] aida: ߄ ߐ  ߋ  ߹
Technetium1 commented 8 years ago

I want to be able to ban unicode PM spam so very much. Not knowing what the spam contained was a problem.

GoelBiju commented 8 years ago

@Technetium1 , I think what is essential when banning unicode in particular is the raw unicode number. I mean just for example saying if '╚' in msg: self.send_ban_msg(self.user.nick, self.user.id) might not be enough. We want to produce a small catalog (if you intend on banning someone who sends a particular unicode symbol).

So ╚ would be equal to sending u'\u255A' in python.

Technetium1 commented 8 years ago

@GoelBiju is there a site in particular you use for finding the python equivalents?

GoelBiju commented 8 years ago

@Technetium1 , most of the time I have consulted file format info's unicode section to find the python equivalents; their HTML codes and equivalents for other languages are also available. I would try it in a Python IDE before implementing it.

NOTE: It's worth just copying the actual unicode symbol from the site and pasting it into the client in any room to see if it renders properly. Some of the popular unicode render, while others may display a placeholder.

GoelBiju commented 8 years ago

All-in-all, like @Autotonic said, it is worthwhile for @nortxort to think about rendering unicode and writing it to the log file for reference.

ghost commented 8 years ago

@Technetium1 use http://graphemica.com/ Can copy and paste the character into the search, scroll down and there is a "Python: " bit.

Technetium1 commented 8 years ago

Thanks @GoelBiju and @Autotonic!

nortxort commented 8 years ago

This has been added now. Thanks @Autotonic