jaraco / irc

Full-featured Python IRC library for Python.
MIT License
392 stars 86 forks source link

buffer.py UTF-8 Decoding error #68

Closed jaraco closed 8 years ago

jaraco commented 8 years ago

Hey,

got another stacktrace:

#!python

Traceback (most recent call last):
  File "./ratbot.py", line 293, in <module>
    main()
  File "./ratbot.py", line 290, in main
    bot.start()
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/bot.py", line 265, in start
    super(SingleServerIRCBot, self).start()
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/client.py", line 1246, in start
    self.reactor.process_forever()
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/client.py", line 278, in process_forever
    self.process_once(timeout)
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/client.py", line 259, in process_once
    self.process_data(i)
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/client.py", line 216, in process_data
    c.process_data()
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/client.py", line 579, in process_data
    for line in self.buffer:
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/buffer.py", line 102, in lines
    self.handle_exception()
  File "/home/luke/pipsqueak2-deploy/lib/python3.4/site-packages/irc/buffer.py", line 100, in lines
    yield line.decode(self.encoding, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 109: invalid start byte

Doesn't seem like there is any way for application layer (library consumer) to get ahold of this decoding error and handle it. Imho LenientLineDecodingBuffer should be used by default, or DecodingBuffer class to use should be made configurable. On IRC clients will send any kind of data, and a client shouldn't explode because of non-utf8 data sent.


jaraco commented 8 years ago

Duplicate of #40.


Original comment by: Jason R. Coombs

jaraco commented 8 years ago

This IRC library is opinionated about the default encoding, in particular that it's better to use a rich, preferred encoding and to fail when than assumption is violated rather than provide an imperfect decoding and silently pass. I'm confident in this opinion as this code is being used with the default decoder in many environments with great success. I recognize that others prefer a more lenient default decoding, which is why the decoder is supplied and documented in the readme.

Thanks for registering your concerns and sorry for any inconvenience.


Original comment by: Jason R. Coombs