tsroten / pynlpir

A Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.
MIT License
566 stars 135 forks source link

fixed issue #52 #53

Closed kensk8er closed 8 years ago

kensk8er commented 8 years ago

Fixed the issue #52, which I created, and wrote a unittest.

It seems the root cause of the issues is NLPIR sometimes returns pos tags which are not defined in dict pynlpir.pos_map.POS_MAP. For example, NLPIR returns Mg for word '甲', which isn't defined in the POS_MAP. In this case None needs to be returned for the word, but instead it caused the error.

I simply added an IF statement which avoids adding None to a tuple.

tsroten commented 8 years ago

@kensk8er Thanks for the pull request! It looks good. I appreciate you fixing the issue you found.

It's too bad that NLPIR sometimes returns undocumented part of speech codes. This isn't the first time we've seen this. Their documentation doesn't include mg: https://github.com/NLPIR-team/NLPIR/blob/master/NLPIR%20SDK/NLPIR-ICTCLAS/doc/ICTPOS3.0.doc

You don't need the u in front of the unicode strings in the test file since we import from __future__ import unicode_literals, but it doesn't harm anything, so I'll leave them.

kensk8er commented 8 years ago

@tsroten Thanks for merging the PR. Do you have any idea when the fix will be included in the release and available from pip install?

tsroten commented 8 years ago

@kensk8er I plan to release it next week. I'd like to include a license auto-updater I've been working on.

Thanks again!