Closed visig9 closed 4 years ago
I think this is caused by the fact that we didn't consider that there would be duplications in the dict.
cc @MnO2
ok, I can have a look.
@MnO2
For with_dict
method, this line of code in the load_dict
method:
https://github.com/messense/jieba-rs/blob/4fd483c0580528734e01328862d3475f98fb834e/src/lib.rs#L313
will always return None
since self.cedar
isn't built yet:
It is related to how we expect the load_dict
should be called, and its semantic.
If you call load_dict
and call it again, exact_match_search
shouldn't always return None
.
Since the method is public, the method could be called again and again. Then there are a few approaches we could choose.
For the code we have right now it is 1
, though from a brief overlook I couldn't figure out why it causes panic, and why your PR by applying self.cedar.update(word, word_id);
in the for loop has fixed the issue. Since update
should already be covered by build
in cedarwood, which is basically a for loop for update.
/// Build the double array trie from the given key value pairs
#[allow(dead_code)]
pub fn build(&mut self, key_values: &[(&str, i32)]) {
for (key, value) in key_values {
self.update(key, *value);
}
}
I guess there might be hidden bugs in cedarwood, I need to spend more time on it, but in the meantime I am good with the PR though I am not exactly sure why it fixed the panic,
https://github.com/MnO2/cedarwood/commit/858e9033395ad0cb5aa86347958cb7626c4446d4 This commit should fix the issue.
Test Case
The
dict.txt.big
can fetch from https://github.com/fxsjy/jieba/blob/master/extra_dict/dict.txt.bigResult
Clue
Look into the
dict.txt.big
...... when I delete all the lines after
亞 5789 j
(line number33908
), the panic disappeared. The result look like following:Env