tsroten / pynlpir

A Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.
MIT License
566 stars 135 forks source link

ImportUserDict get stuck when loading big dictionary. #60

Closed huntzhan closed 8 years ago

huntzhan commented 8 years ago

OS: CentOS 6.7

Test code I've run:

import pynlpir
pynlpir.open()
pynlpir.nlpir.ImportUserDict(b'path/to/dict')

example format of dict:

中文 3 n

It takes approximately 1 second to load a dictionary with 10k words, and ImportUserDict get stuck when I try to load a dictionary of 100k, and there's no way to trigger a keyboard interrupt.

I suspect that the time complexity of loading user dictionary is greater than O(n^2).

tsroten commented 8 years ago

@huntzhan The pynlpir.nlpir.ImportUserDict() function calls the NLPIR function directly. PyNLPIR itself does not open the user dict file or read its data. You might try asking around at NLPIR: