pytries / datrie

Fast, efficiently stored Trie for Python. Uses libdatrie.
http://pypi.python.org/pypi/datrie/
GNU Lesser General Public License v2.1
530 stars 88 forks source link

Possible bug with range initialization #30

Closed superbobry closed 8 years ago

superbobry commented 8 years ago

Reported in this SO question:

>>> import datrie
>>> trie = datrie.Trie(ranges=[(u'\u0000', u'\u9FFF')])
>>> trie["颖礼仿古烟盒折扣"] = 42
>>> "颖礼仿古烟盒折扣" in trie
False
>>> trie.keys()
['\x96<']

The characters in a string are within the specified range:

>>> ord('\u9FFF')
40959
>>> list(map(ord, "颖礼仿古烟盒折扣"))
[39062, 31036, 20223, 21476, 28895, 30418, 25240, 25187]

Overflow bug somewhere?

tanghuang commented 8 years ago

thanks to superbobry. I am the one report the question. we haven't fix it till now.

superbobry commented 8 years ago

Apologies, I haven't had a chance to look into this, but I will at some point.

superbobry commented 8 years ago

The answer is in #10: datrie the library doesn't support alphabet ranges of size larger than 256.

yao62995 commented 7 years ago

@superbobry You can solve it as follows:

>>> import datrie
>>> trie = datrie.Trie(ranges=[(u'\x00', u'\xff')])
>>> key="颖礼仿古烟盒折扣"
>>> trie[key.decode('latin1')]=42
>>> trie[key.decode('latin1')]
42
>>> trie.keys()
[u'\xe9\xa2\x96\xe7\xa4\xbc\xe4\xbb\xbf\xe5\x8f\xa4\xe7\x83\x9f\xe7\x9b\x92\xe6\x8a\x98\xe6\x89\xa3']