tsroten / pynlpir

A Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.
MIT License
566 stars 135 forks source link

Having problem when trying to segment a sentence #97

Closed stickjitb closed 6 years ago

stickjitb commented 6 years ago

When I try to segment a sentence "〜refrain〜 The songs were inspired by "EVANGELION"", I will get the output "^|^|銆淾|^|refrain|^|^|銆淾|^|The|songs|were|inspired|by|&|quot|;|EVANGELION|" (| stands for the border of words)

If I delete the last word "EVANGELION", I will get the correct output "〜|refrain|〜| |The| |songs| |were| |inspired| |by| |&|quot|;|"

I know this program is mainly used for Chinese word segmentation, but I still wonder what causes this problem? It is quite interesting, isn't it?

tsroten commented 6 years ago

Yes it is! PyNLPIR doesn't do the segmentation, however. You'd have to check with the ICTCLAS/NLPIR project that we use for this.