Closed Honghe closed 9 years ago
Same bug here. Calling __contains__
and len
on an empty Trie causes a segfault.
Here's gdb where
output:
#0 0x00007ffff6e48681 in memcpy () from /lib64/libc.so.6
#1 0x00007ffff090cdf8 in dstring_append_char (ds=0xf95d80, data=0x7fffffffd8cc) at libdatrie/datrie/dstring.c:157
#2 0x00007ffff090cf33 in trie_string_append_char (ts=<value optimized out>, tc=4 '\004') at libdatrie/datrie/trie-string.c:97
#3 0x00007ffff090a850 in da_first_separate (d=0xf39200, root=<value optimized out>, keybuff=0xf95d80) at libdatrie/datrie/darray.c:780
#4 0x00007ffff090be7b in trie_iterator_next (iter=0xf48ba0) at libdatrie/datrie/trie.c:998
#5 0x00007ffff08ffe8e in __pyx_pf_6datrie_8BaseTrie_24__len__ (__pyx_v_self=<value optimized out>) at src/datrie.c:3736
#6 __pyx_pw_6datrie_8BaseTrie_25__len__ (__pyx_v_self=<value optimized out>) at src/datrie.c:3662
#7 0x00007ffff7aea18c in builtin_len (self=<value optimized out>, v=<value optimized out>) at Python/bltinmodule.c:1317
#8 0x00007ffff7af42a2 in call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4021
#9 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#10 0x00007ffff7af4c6e in PyEval_EvalCodeEx (co=0x7ffff7eddf30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7ffff7eeba98, kwcount=0,
defs=0x7fffe3810f98, defcount=2, closure=0x0) at Python/ceval.c:3265
#11 0x00007ffff7af32aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#12 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#13 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#14 0x00007ffff7af4c6e in PyEval_EvalCodeEx (co=0x7ffff7e90130, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:3265
#15 0x00007ffff7af4d82 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, locals=<value optimized out>) at Python/ceval.c:667
#16 0x00007ffff7b14ba0 in run_mod (fp=0x6b9400, filename=<value optimized out>, start=<value optimized out>, globals=0x7ffff7f7e168, locals=0x7ffff7f7e168, closeit=1, flags=0x7fffffffdf80)
at Python/pythonrun.c:1371
#17 PyRun_FileExFlags (fp=0x6b9400, filename=<value optimized out>, start=<value optimized out>, globals=0x7ffff7f7e168, locals=0x7ffff7f7e168, closeit=1, flags=0x7fffffffdf80)
at Python/pythonrun.c:1357
#18 0x00007ffff7b14d7f in PyRun_SimpleFileExFlags (fp=0x6b9400, filename=0x7fffffffe390 "/tmp/grappa.py", closeit=1, flags=0x7fffffffdf80) at Python/pythonrun.c:949
#19 0x00007ffff7b2a664 in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:645
#20 0x00007ffff6dddd5d in __libc_start_main () from /lib64/libc.so.6
#21 0x0000000000400649 in _start ()
Update: turns out there's an unbounded loop in darray.c:770
. Investigating further ...
I'm not 100% sure this is correct, but it seems like max_c
is 0 if the trie was just created. But when max_c
is 0, the loop does a single iteration, thus c
is never 0
and the exit condition fails.
diff --git a/libdatrie/datrie/darray.c b/libdatrie/datrie/darray.c
index c07712a..ea57ff3 100644
--- a/libdatrie/datrie/darray.c
+++ b/libdatrie/datrie/darray.c
@@ -774,7 +774,7 @@ da_first_separate (DArray *d, TrieIndex root, TrieString *keybuff)
break;
}
- if (c == max_c)
+ if (max_c == 0 || c == max_c)
return TRIE_INDEX_ERROR;
trie_string_append_char (keybuff, c);
I can submit a PR if you're okay with the "solution".
Why is max_c
0? If trie only consists of a root node, then root
points at the last cell in the DA array, which is initialized as {base = DA_POOL_BEGIN, check = 0}
. Substituting this into the if condition we get
da_get_check (d, base + c) == root
// => da_get_check(d, base) == root
// => 0 == 2 (see da_get_root() for the 2)
Hi @superbobry,
I think it is better to change the upstream code (http://linux.thai.net/~thep/datrie/datrie.html) if there is a bug.
Yup, submitting the bug to the libdatrie
author was my first thought, but the project page lacks a link to the bug tracker or repo. Should I email the author directly?
Yes, I've emailed him in past; we discussed some libdatrie changes needed for this wrapper and he made them in an original repo. A couple of bugs was also fixed this way.
There is SVN repo: http://linux.thai.net/svn/software/datrie/?q=svn/software/datrie - it seems our copy is a bit outdated. There is also one incompatibility between SVN version and our version - here we have one extra method (https://github.com/kmike/datrie/commit/dbe366b498fe13abdc944b5e4eeff331d9063c1a). In past I've just made sure it is kept when libdatrie is updated.
The bug has been fixed in the SVN version. Should we wait for the next release or pull the SVN version into the repo instead?
:+1: we should pull the SVN version.
There is no notice while pending. And even may cause Segment Fault.