tsroten / pynlpir

A Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.
MIT License
566 stars 135 forks source link

add user defined pos tag #87

Closed lomizandtyd closed 7 years ago

lomizandtyd commented 7 years ago

add additional pos tag, according to UserDefinedDict.lst

tsroten commented 7 years ago

@lomizandtyd Thanks for the pull request! I appreciate it!

Unfortunately, those do not appear to be standard NLPIR part of speech tags, so I don't think it would be good to add them to PyNLPIR. They are part of a user-specific file. Not all users have these tags in their files. Plus, for some reason it's not putting nouns under n (名词).

Instead, how about we make PyNLPIR support user-defined part of speech tags? That way, those that want to use extra part of speech tags can. Then you can do something like:

import pynlpir

my_pos_tags = pynlpir.pos_map.POS_MAP
my_pos_tags['g'] = ('专有名词', 'proper noun')

pynlpir.segment(s, pos_tags=my_pos_tags)

Here is the new feature: https://github.com/tsroten/pynlpir/pull/88 If you get a chance to check it out, let me know how it works for you. Thanks!

lomizandtyd commented 7 years ago

It works fine. Thanks a lot!