I might be missing something out here, but I believe there is a consistency issue with the Trie implementation and keys that have null characters.
Take a look at the following code snippet:
key = 'Random\x00Key'
python_dict = { key : 'random_value' }
key in python_dict # prints True
python_dict.get(key) # returns 'random_value'
std_trie = marisa_trie.Trie(python_dict)
key in std_trie # prints True
std_trie.keys() # prints ['Random\x00Key']
std_trie.get(key) # should return 'random_value'
std_trie.key_id(key) # should return the key id
What happens is that std_trie.get(key) actually returns None, and std_trie.key_id(key) throws a KeyError exception with the following trace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/marisa_trie.pyx", line 409, in marisa_trie.Trie.key_id
File "src/marisa_trie.pyx", line 417, in marisa_trie.Trie.key_id
KeyError: 'Random\x00Key'
Apparently, the RecordTrie implementation is immune to this consistency issue.
r_trie = marisa_trie.RecordTrie('<H', zip([key], [(1,)]))
key in r_trie # prints True
r_trie.keys() # prints ['Random\x00Key']
r_trie.get(key) # prints [(1,)]
By the way, if you confirm this issue as an actual bug, also check DAWG. I haven't used it extensively, but calling dawg.DAWG(python_dict) should throw the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "dawg.pyx", line 45, in dawg.DAWG.__init__ (src\dawg.cpp:2147)
File "dawg.pyx", line 70, in dawg.DAWG._build_from_iterable (src\dawg.cpp:2570)
dawg.Error: Can't insert key b'Random\x00Key' (with value 0)
I'm running Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 and using marisa-trie 0.7.5 and DAWG 0.7.8.
I might be missing something out here, but I believe there is a consistency issue with the
Trie
implementation and keys that have null characters.Take a look at the following code snippet:
What happens is that
std_trie.get(key)
actually returnsNone
, andstd_trie.key_id(key)
throws aKeyError
exception with the following trace:Apparently, the
RecordTrie
implementation is immune to this consistency issue.By the way, if you confirm this issue as an actual bug, also check DAWG. I haven't used it extensively, but calling
dawg.DAWG(python_dict)
should throw the following error:I'm running
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
and usingmarisa-trie 0.7.5
andDAWG 0.7.8
.