stenyak / breakbot

WhatsApp<->IRC gateway bot
90 stars 38 forks source link

Need to strip ASCII Color/Underline/Reverse codes from IRC text. #8

Open thinko opened 10 years ago

thinko commented 10 years ago

Need to scrape the incoming IRC messages and strip the ASCII codes used for color and other formatting when bringing across to Whatsapp.

I'm guessing should be some simple regex like: import re // regex = re.compile("\x0f|\x1f|\x02|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE) regex.sub('', msg)

Right now, the ascii shows up in Whatsapp like this:

[0mHostname [33m: [0mserver [33m - [0mOS [33m: [0mLinux 3.0.0-1-amd64/x86_64 [33m - [0mDistro [33m: [0mDebian 7.1 [33m - [0mCPU [33m: [0m4 x Intel Xeon (1995.065 MHz) [33m - [0mProcesses [33m: [0m187 [33m - [0mUptime [33m: [0m97d 8h 35m [33m - [0mUsers [33m: [0m23 [33m - [0mLoad Average [33m: [0m0.73 [33m - [0mMemory Usage [33m: [0m1834.04MB/3950.38MB (46.43%) [33m - [0mDisk Usage [33m: [0m35.36GB/49.16GB (71.93%)
stenyak commented 10 years ago

As far as I know, WhatsApp does not support ANSI escape codes, so yes: ideally they should be stripped before sending to WA.

After a quick googling, this python snippet should be able to remove any ANSI escape code in a string: http://stackoverflow.com/a/2187024

The exact point of BreakBot in which that snippet should be inserted & adapted would be: https://github.com/stenyak/breakbot/blob/master/wa_bot.py#L130

(specifically, the "text" variable is what needs to be un-escaped)

Could you please test if that works for your use cases?

Thanks for the report.