tsroten / pynlpir

A Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.
MIT License
566 stars 135 forks source link

question about "pynlpir.get_key_words" #78

Closed flywithyu closed 7 years ago

flywithyu commented 7 years ago

when i use pynlpir.get_key_words, can i get the parts of the speech of the key words? I try to use pynlpir.segment, but find that the words obtained from pynlpir.get_key_words and pynlpir.segment may be different. Thank you and best wishes.

tsroten commented 7 years ago

@flywithyu, hello! I don't think that NLPIR supports that. PyNLPIR just returns whatever NLPIR does behind the scenes.

You can see what NLPIR supports in its documentation: https://github.com/NLPIR-team/NLPIR/blob/master/NLPIR%20SDK/NLPIR-ICTCLAS/doc/NLPIR-ICTCLAS%E5%88%86%E8%AF%8D%E7%B3%BB%E7%BB%9F%E5%BC%80%E5%8F%91%E6%89%8B%E5%86%8C2016%E7%89%88.pdf

flywithyu commented 7 years ago

Thanks. I accomplish this by searching the same word in segments.

搜索分词,找相同字符串,根据词性判断

    for k in range(1, segments.__len__()+1):
        if key_word == segments[k-1][0]:
            # 设置词性集合
            r2 = '名词:人名 名词:人名:汉语姓氏  名词:人名:汉语名字 名词:人名:日语人名 名词:人名:音译人名' \
                 '名词:地名 名词:地名:音译地名' \
                 '名词:机构团体名 名词:其它专名 ' \
                 '动词 动词:名动词 动词:副动词 动词:不及物动词 动词:趋向动词 动词:行事动词 动词:动词性惯用语' \
                 '形容词 形容词:副形词 形容词:形容词性惯用语' \
                 '数词 数词:数量词' \
                 '量词 量词:动量词 量词:时量词' \
                 '副词 介词 连词 助词 叹词 语气词 拟声词' \
                 '区别词 区别词:区别词性惯用语'

            # 如果关键词词性符合上述集合
            if (segments[k-1][1]=='名词') or (r2.find(segments[k-1][1])== -1):
                # 如果关键词词性不在上述集合中,则将关键词写入文件
                sheet1.write(i, 5, segments[k-1][0])
                bWrite = False # 设置布尔变量为假
                break
            else:
                break
flywithyu commented 7 years ago

Another question: Now I want to use self-define dictionary. But I find that even I set words like "鸡精" in self-define dictionary, the function get_key_words or segment still obtain the word "鸡". How can I solve this problem? I have tried nlpir.ImportUserDict('D:/user.txt',True) and nlpir.AddUserWord('鸡精 n').

tsroten commented 7 years ago

You might want to check your dict file format and file encoding. See this issue for more information: https://github.com/tsroten/pynlpir/issues/41