Closed jaraco closed 8 years ago
By default, the IRC library does attempt to decode all incoming streams as UTF-8, but I acknowledge that there are cases where decoding is undesirable or a custom decoding option is desirable. To support these cases, since irc 3.4.2, the ServerConnection class may be customized. The 'buffer_class' attribute on the ServerConnection determines what class is used for buffering lines from the input stream. By default it is DecodingLineBuffer, but may be re-assigned with another class, such as irc.client.LineBuffer, which does not decode the lines and passes them through as byte strings. The 'buffer_class' attribute may be assigned for all instances of ServerConnection by overriding the class attribute::
irc.client.ServerConnection.buffer_class = irc.client.LineBuffer
or it may be overridden on a per-instance basis (as long as it's overridden before the connection is established)::
server = irc.client.IRC().server()
server.buffer_class = irc.client.LineBuffer
server.connect()
I've added a section to the README that documents these options.
Does this interface provide the option you seek? If not, please re-open.
Original comment by: Jason R. Coombs
I've updated the README in https://bitbucket.org/jaraco/irc/changeset/807ab45d31fe to describe the option available to disable/customize encoding.
Original comment by: Jason R. Coombs
Thank you for the reply. It helped me a lot, but I've come up with another problem, mainly because I'm using Python 3.
The library has somewhat mixed uses between bytes
and str
, and when you convert bytes
to str
implicitly it would result "b'this'"
.
We should explicitly choose what to use between two kinds of strings, and I would like to recommend bytes
. For example, the channel names are allowed to contain almost any sequences of bytes as specified by RFC 1459, so bytes
should be suitable. But when you do that, every line would become problematic:
irc.client.is_channel()
: string[0] in "#&+!"
irc.client.ServerConnection.join()
: "JOIN %s%s" % (channel, (key and (" " + key)))
NickMask(prefix)
when a privmsg event has occuredSo I'm trying to convert all the internal strings to bytes
on my fork, in a similar fashion I've done to irclib
: https://github.com/puzzlet/python-irclib
Original comment by: puzzlet
I see
irc/client.py
assumes all the packets are encoded in UTF-8.But in reality, non-UTF-8 texts are around: privmsg's are truncated by server by bytes hence sometimes broken, and some servers and channels still use their own local encodings other than UTF-8.
So I think the library should have an option for non-UTF-8 modes.