tnm / qr

Queues, stacks, deques, and priority queues with Redis in Python
225 stars 42 forks source link

Unicode Key Causes Encoding Error with Log Statement #10

Open jacinda opened 11 years ago

jacinda commented 11 years ago

I noticed this while using qr (which is great, btw) with Django, which uses unicode for everything and I ended up using something like q = Queue(u'my_key') without realizing it at first because my_key was a variable and not a string I had hard-coded. It also only broke if the value being popped met got pickled with non-ascii characters.

This error occurs because of the combination of using a cPickle protocol of 1 with a unicode string. There are a couple of solutions to the bug. Let me know which you prefer and I'll submit a patch.

Here is a detailed description.

Because of the way _pack is defined using protocol 1, cPickle uses a binary format for serialization:

def _pack(self, val):
    """Prepares a message to go into Redis"""
    return self.serializer.dumps(val, 1)

When a log statement is then executed on popping, if the string used for key lookup is unicode, a UnicodeDecodeError will be raised if the value of popped containing any hex values greater than 127.

log.debug('Popped ** %s ** from key ** %s **' % (popped, self.key))

Here is an example:

>>> import cPickle
>>> x = cPickle.dumps(128, 1)
>>> x
'K\x81.'
>>> u = u'unicode string'
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 11: ordinal not in range(128)

This does not fail if protocol 0 is used:

>>> x = cPickle.dumps(128)
>>> 'Popped ** %s ** from key ** %s **' % (x, u)
u'Popped ** I129\n. ** from key ** unicode string **'

It also does not fail if the unicode string is specifically encoded as ascii:

>>> x = cPickle.dumps(128, 1)
>>> 'Popped ** %s ** from key ** %s **' % (x, u.encode('ascii'))
'Popped ** K\x80. ** from key ** unicode string **'

Either changing the pickling protocol or using explicit encoding are options and I can submit either as a patch (or do something else you suggest if both of these are considered less than ideal). Let me know what the preferred solution is.

tnm commented 10 years ago

I'd be cool with the explicit encoding.