Closed fayeshine closed 8 years ago
@fayeshine Thank your for reporting this bug. This seems to be a problem with NLPIR itself, not PyNLPIR. Here are some tests:
>>> nlpir.ParagraphProcess('我们\n我们'.encode('utf8'), False)
b'\xe6\x88\x91\xe4\xbb\xac \n\xe6\x88\x91\xe4\xbb\xac '
>>> nlpir.ParagraphProcess('我们\n'.encode('utf8'), False)
b'\xe6\x88\x91\xe4\xbb\xac \n'
>>> nlpir.ParagraphProcess('我们\ntest'.encode('utf8'), False)
b'\xe6\x88\x91\xe4\xbb\xac \ntest '
>>> nlpir.ParagraphProcess('test\n我们'.encode('utf8'), False)
b'test \n\xe6\x88\x91\xe4\xbb\xac '
>>> nlpir.ParagraphProcess('test\n我们\n'.encode('utf8'), False)
b'test \n\xe6\x88\x91\xe4\xbb\xac \n'
>>> nlpir.ParagraphProcess('test\n'.encode('utf8'), False)
[...NLPIR hangs...]
So, an easy solution is to strip any newlines that appear at the end of the input string before calling nlpir.ParagraphProcess
. The problem does not seem to affect nlpir.GetKeyWords
.
This would be a simple addition to pynlpir.segment
. We'll leave nlpir.ParagraphProcess
alone. Anyone willing to submit a pull request for this?
@fayeshine Okay, I've fixed this in the latest develop
branch. I'll publish a release to PyPi shortly. Thanks again!
If you run
pynlpir.segment('E\n')
, then the program is stuck -- if there's'
n'` and an English word. It is easy to get this bug, please fix this, thanks.