jaraco / irc

Full-featured Python IRC library for Python.
MIT License
392 stars 87 forks source link

'utf8' codec can't decode byte 0x95 in position 134: invalid start byte #109

Closed igel-kun closed 8 years ago

igel-kun commented 8 years ago

hi, I frequently encounter the following error when performing DCC transactions:

 File "/usr/lib/python2.7/dist-packages/irc/client.py", line 1223, in start
    self.ircobj.process_forever()
  File "/usr/lib/python2.7/dist-packages/irc/client.py", line 268, in process_forever
    self.process_once(timeout)
  File "/usr/lib/python2.7/dist-packages/irc/client.py", line 249, in process_once
    self.process_data(i)
  File "/usr/lib/python2.7/dist-packages/irc/client.py", line 214, in process_data
    c.process_data()
  File "/usr/lib/python2.7/dist-packages/irc/client.py", line 558, in process_data
    for line in self.buffer:
  File "/usr/lib/python2.7/dist-packages/irc/buffer.py", line 84, in <genexpr>
    for line in super(DecodingLineBuffer, self).lines())
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x95 in position 134: invalid start byte

I am wondering if maybe the DCC transfer should not be interpreted as UTF8, but binary data?

jaraco commented 8 years ago

The DCCConnection uses a LineBuffer that shouldn't be decoding anything... it seems like something is being sent over the primary ServerConnection that's not UTF-8. Perhaps that's the byte count being sent in binary?

igel-kun commented 8 years ago

it appears you're right, the peer isn't speaking proper UTF-8; the message that crashes the script is some status notice from the peer. As I care little about this status, I just set the decoding to "replace" instead of "strict" behavior on error and that keeps it from crashing. Ah it also seems that my version (debian jessie default) is a bit outdated, as it seems that the current version uses a separate buffer implementation. Hmm OK.