Closed Christiannov closed 5 years ago
Hello @Christiannov! So, PyNLPIR doesn't actually handle the segmentation itself, it uses NLPIR behind the scenes. I've found that NLPIR is pretty picky with regards to grammar. In many cases it appears to leave out words if the sentence is missing punctuation marks.
Try adding a period (full stop) at the end of your string and see if that helps.
I'd also suggest reporting this issue to NLPIR. You could try their website/forum or their GitHub page.
I've tried adding a period at the end of the string and the bug no longer occurs.
In addition, I looked at the repository of NLPIR and similar issues have already been mentioned. But it seems that the contributor has not given a better solution then adding a period. Fortunately, this bug is not a big problem for me at present.
Anyway, thank you very much @tsroten!
Hello, I found that pynlpir has a bug when dealing with the sentence "本报编辑部评出2000年国内十大新闻". The following code block shows that we missed the word "新闻" in the segmentation result of the sentence.
I don't understand why this bug occurs. Do you have any idea? Thanks!