jaraco / irc

Full-featured Python IRC library for Python.
MIT License
392 stars 87 forks source link

dcc file transfer (receiving) decode error #24

Closed jaraco closed 8 years ago

jaraco commented 8 years ago

i used the script "dccreceive.py" and when receiving a file larger then 1151 bytes (or the received byte excede this)

i tested this simply by trial and error on struct.pack("!I", bytes).encode("utf-8") it fails on:

#!python

>>> struct.pack("!I", 1152).encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3: invalid start byte

the private message that is returned with the received bytes can not be encoded with utf-8

the message that is supposed to be send is a string from struct.pack() e.g.

#!python
struct.pack("!I", bytesReceived)

that output is given to the dcc privatemsg function where it is being encoded to utf-8

#!python
string = struct.pack("!I", bytesReceived)
bytes = string.encode('utf-8')

now with small bytesReceived this works:

#!python
>>> struct.pack("!I", 14).encode("utf-8")
'\x00\x00\x00\x0e'

but

#!python
>>> struct.pack("!I", 1440).encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 3: invalid start byte

i checked what "0xa0" is and its a none breaking space and apparently a utf-8 string is not allowed to start with that!

i dont understand why anything from struct.pack() is not utf-8 encodable since it should be a string representing bytes not a byte string

#!python
>>> type(struct.pack("!I", 14).encode("utf-8"))
<type 'str'>

so in conclusion i dont know if this is an error in struct.pack() or anywhere else, besides is it needed to encode the message at all ?


jaraco commented 8 years ago

okay i just wrapped the

#!python

bytes = string.encode('utf-8')

from client.py @ line ~1129 in a try block like so:

#!python

try:
    bytes = string.encode('utf-8')
except UnicodeDecodeError:
    bytes = string

and this seamed to work at least for my first test


Original comment by: Dennis Lutter

jaraco commented 8 years ago

Hi Dennis. Yes, that is a problem. Thanks for reporting it.

I looked into the code and found no reason why the bytes should be encoded, especially for content like files. As a result, I've committed 372975319a63 which I believe addresses the issue (by providing a separate method for transmitting raw bytes).

Try it out, let me know how it works for you.


Original comment by: Jason R. Coombs

jaraco commented 8 years ago

I've released the fix as part of irc 8.5.


Original comment by: Jason R. Coombs