Closed dengjl closed 8 years ago
@dengjl Thanks!
NLPIR has an example you can look at: https://github.com/NLPIR-team/NLPIR/blob/a632f29b2452195d338e8e5e69a49be31dd69604/NLPIR-ICTCLAS/Data/UserDefinedDict.lst
NLPIR的手册说:
用户词典需要注意的事项还包括:
1. 如果用户词有空格,需要采用[]括起来,例如: [Bill Clinton] nrf
2. 如果需要该用户词作为文章的关键词输出,必须用户词性标注为:key,如:科学发展观 key
3. 如果将一个词是人名,同时又希望作为关键词输出,则需要标注为 key_nr,如 钟南山 key_nr
4. 如果将一个词是地名,同时又希望作为关键词输出,则需要标注为 key_ns,如 钓鱼岛 key_ns
5. 如果将一个词是机构名,同时又希望作为关键词输出,则需要标注为 key_nr,如 国安 委 key_nt
The manual also has some more information.
And, you could check out this: https://github.com/NLPIR-team/NLPIR/tree/a632f29b2452195d338e8e5e69a49be31dd69604/NLPIR-ICTCLAS/importuserdict
thank you very much!
by the way, does the function pynlpir.segment include the user dictionary automatically? I think the nlpir.ParagraphProcess can do it, but pynlpir.segment seems not work
Oh I find my mistake, the dictionary is not 'utf-8' coding
thanks for your work. what's the format of the user dict? I can't import it right