Closed mjwillson closed 10 years ago
No, this is not a part of contract AFAIK, this looks like a bug.
Cheers Actually the thing with doing a key_id beforehand may be a red herring -- the bug seems to disappear with the following innocuous change too:
In [81]: marisa_trie.Trie([u'foo', u'bar']).restore_key(0)
Out[81]: u'bar\x02'
In [82]: t = marisa_trie.Trie([u'foo', u'bar']); t.restore_key(0)
Out[82]: u'bar'
I'm guessing perhaps the Trie is getting garbage-collected in the first instance, but is returning a string whose memory is backed by that freed up space?
It seems a slightly weird intermittent (or at least hard to pin down what triggered it) bug anyway.
Not sure it's gc-related either as still happens if I gc.disable().
Sometimes it happens on all runs after the first run:
In [3]: t = marisa_trie.Trie([u'foo', u'bar']); t.restore_key(0)
Out[3]: u'bar'
In [4]: t = marisa_trie.Trie([u'foo', u'bar']); t.restore_key(0)
Out[4]: u'bar\x02'
Sometimes I'm getting this error too:
In [2]: t = marisa_trie.Trie([u'foo', u'bar']); t.restore_key(0)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-2-663a54335fc8> in <module>()
----> 1 t = marisa_trie.Trie([u'foo', u'bar']); t.restore_key(0)
/usr/local/lib/python2.7/dist-packages/marisa_trie.so in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:4794)()
/usr/local/lib/python2.7/dist-packages/marisa_trie.so in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:4728)()
UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 3: invalid start byte
This is reproducable - a weird bug! I'll try to get to it this weekend.
Hi, I too had the same problem just now ended up at this page. I have a somewhat large trie (2G), and found that running under ipython was not working but running on command-line was fine:
The following two work fine:
(1) echo 0 | marisa-reverse-lookup -r TRIEFILE.marisa
(2) python -e "from marisa_trie import Trie; print Trie().load('TRIEFILE.marisa').restore_key(0)"
The third one (in a running instance of IPython) fails: (3) marisa_trie.Trie().load('TRIEFILE.marisa').restore_key(0)
however, it works in a new IPython instance.
Nothing very informative, but one more data point.
Thanks for the extra info.
It is interesting that this issue can be reproduced in an IPython shell, but doesn't manifest itself in a regular Python shell. A test case for it also doesn't fail.
I tried different IPython versions; @mjwillson's example works fine in IPython 0.10 but fails in 0.11+.
Also, it works fine in IPython 1.1 under Python 3.3.
@mjwillson @sisukapalli thanks for the info! This bug should be fixed in 0.5.2. It turned out IPython vs python was a red herring: restore_key
method was building the result incorrectly.
Maybe when code is executed in IPython memory layout is different and there are more non-zero bytes in memory - that could be a reason why the problem pops up only in IPython shell. When a byte after the string end is zero, restore_key
method returned a proper result.
Like so:
This doesn't happen if I first get the
key_id
for that key:If it's part of the contract that
key_id
is needed beforerestore_key
then it should probably be documented, ideally raise some kind of exception if the contract is violated rather than silently return an incorrect result.