pteichman / cobe

A Markov chain based text generation library and MegaHAL style chatbot
http://teichman.org/blog/
MIT License
242 stars 51 forks source link

irc client crash on bad utf8 #5

Open pteichman opened 12 years ago

pteichman commented 12 years ago

Input was bad utf-8: make sure this works properly both in the irc client's wrapper (this trace) and in cobe.Brain itself.

Traceback (most recent call last):
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/bin/cobe", line 8, in <module>
    load_entry_point('cobe==2.0.4', 'console_scripts', 'cobe')()
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/cobe/control.py", line 42, in main
    args.run(args)
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/cobe/commands.py", line 244, in run
    Runner().run(b, args)
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/cobe/irc.py", line 116, in run
    bot.start()
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 1114, in start
    self.ircobj.process_forever()
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 229, in process_forever
    self.process_once(timeout)
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 214, in process_once
    self.process_data(i)
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 183, in process_data
    c.process_data()
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 581, in process_data
    self._handle_event(Event(command, prefix, target, [m]))
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 604, in _handle_event
    self.irclibobj._handle_event(self, event)
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 325, in _handle_event
    if handler[1](connection, event) == "NO MORE":
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/cobe/irc.py", line 32, in _dispatcher
    irclib.SimpleIRCClient._dispatcher(self, c, e)
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/irclib.py", line 1049, in _dispatcher
    getattr(self, m)(c, e)
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/site-packages/cobe/irc.py", line 99, in on_pubmsg
    text = text.decode("utf-8").strip()
  File "/home/peter/lib/cobe/cracklinhal/virtualenv/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 8-10: invalid data
pteichman commented 12 years ago

To reproduce, attempt to decode iso-8859-1 text as utf-8: "\xfcnicode".decode("utf-8")

Maybe fall back on iso-8851-15 on UnicodeDecodeError: "\xfcnicode".decode("iso-8859-15")