Closed kensk8er closed 8 years ago
@kensk8er Thanks for the pull request! It looks good. I appreciate you fixing the issue you found.
It's too bad that NLPIR sometimes returns undocumented part of speech codes. This isn't the first time we've seen this. Their documentation doesn't include mg
: https://github.com/NLPIR-team/NLPIR/blob/master/NLPIR%20SDK/NLPIR-ICTCLAS/doc/ICTPOS3.0.doc
You don't need the u
in front of the unicode strings in the test file since we import from __future__ import unicode_literals
, but it doesn't harm anything, so I'll leave them.
@tsroten Thanks for merging the PR. Do you have any idea when the fix will be included in the release and available from pip install
?
@kensk8er I plan to release it next week. I'd like to include a license auto-updater I've been working on.
Thanks again!
Fixed the issue #52, which I created, and wrote a unittest.
It seems the root cause of the issues is NLPIR sometimes returns pos tags which are not defined in dict
pynlpir.pos_map.POS_MAP
. For example, NLPIR returnsMg
for word '甲', which isn't defined in thePOS_MAP
. In this caseNone
needs to be returned for the word, but instead it caused the error.I simply added an IF statement which avoids adding
None
to a tuple.