jaraco / irc

Full-featured Python IRC library for Python.
MIT License
392 stars 86 forks source link

IRC client should not crash on failed decoding. #34

Closed jaraco closed 8 years ago

jaraco commented 8 years ago

In the default configuration, if non-UTF-8 is transmitted to the client, it will crash with an error:

#!python

Traceback (most recent call last):
  File "irc_bot.py", line 144, in <module>
    i.run()
  File "irc_bot.py", line 120, in run
    self.client.process_forever()
  File "/usr/local/lib/python3.2/dist-packages/irc/client.py", line 267, in process_forever
    self.process_once(timeout)
  File "/usr/local/lib/python3.2/dist-packages/irc/client.py", line 248, in process_once
    self.process_data(i)
  File "/usr/local/lib/python3.2/dist-packages/irc/client.py", line 213, in process_data
    c.process_data()
  File "/usr/local/lib/python3.2/dist-packages/irc/client.py", line 558, in process_data
    for line in self.buffer:
  File "/usr/local/lib/python3.2/dist-packages/irc/buffer.py", line 84, in <genexpr>
    for line in super(DecodingLineBuffer, self).lines())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 96: invalid start byte

jaraco commented 8 years ago

line 80 of buffer.py should be errors = 'replace'


Original comment by: Tim L

jaraco commented 8 years ago

This IRC library takes the (somewhat progressive) approach of assuming UTF-8 for all input, but also provides a straightforward mechanism for clients to customize the decoding behavior. See the section Decoding Input in the Overview. The last example gives a one-line example for the supported mechanism for disabling strict decoding. If you add that to your client startup code, you'll have the behavior you seek.


Original comment by: Jason R. Coombs

jaraco commented 8 years ago

Nice to see this is already addressed in the documentation but it's a rather non-sane default for the library to crash with common input. would you consider wrapping the decode in a try and printing a warning that some input was omitted due to the strict utf8 setting?


Original comment by: Tim L

jaraco commented 8 years ago

Good idea.


Original comment by: Jason R. Coombs

jaraco commented 8 years ago
#!python

irc.client.ServerConnection.buffer_class.errors = 'ignore'

This fixes it and lets connects much faster.


Original comment by: Yamaii

jaraco commented 8 years ago

Updated changelog and readme reflecting new LenientDecodingLineBuffer. Fixes #34.

→ <>


Original comment by: Jason R. Coombs