tsroten / pynlpir

A Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.
MIT License
566 stars 135 forks source link

Need to add more POS tag #82

Closed KayShen closed 5 years ago

KayShen commented 7 years ago

as updated in NLPIR: https://github.com/NLPIR-team/NLPIR/blob/5e4f0a6a35906472a8ddd4f8457c13eb03174204/NLPIR%20SDK/DocExtractor/Data/UserDefinedDict.lst

POS tag like 'gtw', 'gwheb', 'grjyy', etc. are not recognized in this version.

For example: pynlpir.segment(u"接受党和国家领导人接见接受央视北京卫视北京日报新京报世纪英语报等媒体的采访")

tsroten commented 7 years ago

@KayShen Thanks for pointing this out.

For anyone that is interested, I'd be happy to accept a pull request. Here is the file that needs to be updated: https://github.com/tsroten/pynlpir/blob/develop/pynlpir/pos_map.py

tsroten commented 5 years ago

Some of the new tags were added to PyNLPIR recently (on PyPI). Also, there is a new pos_names option: raw. This will simply return whatever NLPIR provides as the part of speech tag. This is a workaround for any other tags that might be missing.

Also, like before, you can still pass your own part of speech mapping in with any missing tags.