能不能加个自定义词典功能？

ZengJianxin commented 7 years ago

您好！您写的这个库很好用！但我在使用中遇到一个问题，不能自定义词典。能不能加入jieba.load_userdict功能？

libchaos commented 7 years ago

强烈建议添加

letiantian commented 7 years ago

@ZengJianxin @libchaos

不需要对textrank4zh修改，就可以支持添加自定义词库。看下面的例子：

#-*- encoding:utf-8 -*-
from __future__ import print_function

import sys
try:
    reload(sys)
    sys.setdefaultencoding('utf-8')
except:
    pass

# import jieba
# jieba.add_word('端午节你好', 100, 'n')

import codecs
from textrank4zh import TextRank4Keyword

text = '''
端午节你好端午节你好端午节
'''

tr4w = TextRank4Keyword()
tr4w.analyze(text=text,lower=True, window=3, pagerank_config={'alpha':0.85})

for item in tr4w.get_keywords(30, word_min_len=2):
    print(item.word, item.weight, type(item.word))

运行结果：

端午节 0.5 <type 'unicode'>
你好 0.5 <type 'unicode'>

将jieba的注释去掉后：

#-*- encoding:utf-8 -*-
from __future__ import print_function

import sys
try:
    reload(sys)
    sys.setdefaultencoding('utf-8')
except:
    pass

import jieba
jieba.add_word('端午节你好', 100, 'n')  # 自定义词库

import codecs
from textrank4zh import TextRank4Keyword

text = '''
端午节你好端午节你好端午节
'''

tr4w = TextRank4Keyword()
tr4w.analyze(text=text,lower=True, window=3, pagerank_config={'alpha':0.85})

for item in tr4w.get_keywords(30, word_min_len=2):
    print(item.word, item.weight, type(item.word))

运行结果：

端午节你好 0.649122638064 <type 'unicode'>
端午节 0.350877361936 <type 'unicode'>

hscspring commented 7 years ago

没错，textrank 貌似本质上和语聊是没啥关系的。

letiantian / TextRank4ZH

能不能加个自定义词典功能？ #16