Closed sanikolaev closed 1 month ago
The expectations are basically the same, and you can choose according to your preference.
Thanks for your response.
you can choose according to your preference
The issue is that I can't really prefer one over the other because I don't speak Chinese and haven't used either of these dictionaries before. I just noticed that one is about 5MB, while the other one is 8MB. That seems like a significant size difference for dictionaries. Does this mean that the larger dictionary will result in better segmentation?
Yes. Generally speaking, a larger dictionary does lead to better word segmentation results.
larger dictionary does lead to better word segmentation results
Thank you. Closing the issue.
I want to thank the maintainers of this library for their hard work. We are currently integrating it into Manticore Search (rel. issue https://github.com/manticoresoftware/manticoresearch/issues/931), and I have a question about the dictionaries. How are the dictionaries in this repo (https://github.com/yanyiwu/cppjieba/tree/master/dict) different from the ones in the Jieba repository (https://github.com/fxsjy/jieba/tree/master/extra_dict)?
The formats seem to be the same. How should one decide which dictionary to use?
Translation
抱歉,我不会说中文,但我看到这个仓库里大家通常用中文交流,所以下面是我问题的自动翻译。
我要感谢这个库的维护者们的辛勤工作。我们目前正在将它集成到 Manticore Search 中(相关问题:https://github.com/manticoresoftware/manticoresearch/issues/931),我有一个关于词典的问题。这个仓库中的词典(https://github.com/yanyiwu/cppjieba/tree/master/dict)与 Jieba 仓库中的词典(https://github.com/fxsjy/jieba/tree/master/extra_dict)有什么不同?
它们的格式似乎是一样的。我们应该如何决定使用哪个词典呢?