Open AirFin opened 2 years ago
你的数据中text可能有纯文数字或者缺失值字段
---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.***>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5)
运行代码 import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese')
报错
Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba__init__.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode'
如果不使用DUTIR词典,使用其他词典,则可以正常运行,如:
import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese')
运行结果
{'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1}
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
抱歉,没仔细看问题。我觉得如果DUTIR换成Hownet就ok,那应该是词典问题。
词典问题的话,先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。
---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.***>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5)
运行代码 import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese')
报错
Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba__init__.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode'
如果不使用DUTIR词典,使用其他词典,则可以正常运行,如:
import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese')
运行结果
{'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1}
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
抱歉,没仔细看问题。我觉得如果DUTIR换成Hownet就ok,那应该是词典问题。 词典问题的话,先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。 … ---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba__init__.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典,使用其他词典,则可以正常运行,如: import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>
您好,感谢您的回复。按照您的提示进行修改,仍报错。
我使用的python版本:3.8.5
我的完整代码如下
import cntext as ct
d:\Miniconda3\envs\py38\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
d:\Miniconda3\envs\py38\lib\site-packages\numpy\.libs\libopenblas.4SP5SUA7CBGXUEOC35YP2ASOICYYEQZZ.gfortran-win_amd64.dll
d:\Miniconda3\envs\py38\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
warnings.warn("loaded more than 1 DLL from .libs:"
d:\Miniconda3\envs\py38\lib\site-packages\gensim\similarities\__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
warnings.warn(msg)
print(ct.__version__)
# 导入pkl词典文件,
ct.load_pkl_dict('DUTIR.pkl')
1.7.4
Output exceeds the size limit. Open the full output data in a text editor
{'DUTIR': {'乐': ['急若流星',
'最后一根稻草',
'慌乱',
'张皇',
'心如悬旌',
'鞋里长草-慌了脚',
'紧急',
'五色无主',
'脚忙手乱',
'仓卒应战',
'缓不济急',
'忡忡',
'风声鹤唳',
'心慌意乱',
'心虚',
'体力不支',
'窘急',
'惊慌失措',
'惊慌',
'发急',
'心急火燎',
'芒刺在背',
'着慌',
'心切',
'手忙脚乱',
...
'恰巧',
'意出望外',
'怨不得']},
'Desc': '大连理工大学情感本体库,细粒度情感词典。含七大类情绪,依次是哀, 好, 惊, 惧, 乐, 怒, 恶',
'Referer': '徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.'}
text = '我今天得奖了,很高兴,我要将快乐分享大家。'
ct.sentiment(text=text,
diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
lang='chinese')
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9260/3488132061.py in <module>
1 text = '我今天得奖了,很高兴,我要将快乐分享大家。'
2
----> 3 ct.sentiment(text=text,
4 diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'],
5 lang='chinese')
d:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py in sentiment(text, diction, lang)
157 senti_category_words = diction[senti_category]
158 for w in senti_category_words:
--> 159 jieba.add_word(w)
160
161 sentence_num = len(cn_seg_sent(text))
d:\Miniconda3\envs\py38\lib\site-packages\jieba\__init__.py in add_word(self, word, freq, tag)
424 """
425 self.check_initialized()
--> 426 word = strdecode(word)
427 freq = int(freq) if freq is not None else self.suggest_freq(word, False)
428 self.FREQ[word] = freq
d:\Miniconda3\envs\py38\lib\site-packages\jieba\_compat.py in strdecode(sentence)
77 if not isinstance(sentence, text_type):
...
---> 79 sentence = sentence.decode('utf-8')
80 except UnicodeDecodeError:
81 sentence = sentence.decode('gbk', 'ignore')
AttributeError: 'int' object has no attribute 'decode'
抱歉,没仔细看问题。我觉得如果DUTIR换成Hownet就ok,那应该是词典问题。 词典问题的话,先保证导入的词典符合统一的字典样式。cntext仓库中有关于标准字典样式的词典小案例。 … ---原始邮件--- 发件人: @.> 发送时间: 2022年7月6日(周三) 中午12:47 收件人: @.>; 抄送: @.>; 主题: [hiDaDeng/cntext] 使用DUTIR词典报错 (Issue #5) 运行代码 import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('DUTIR.pkl')['DUTIR'], lang='chinese') 报错 Traceback (most recent call last): File "d:\PythonProject\test\test_cntext.py", line 5, in <module> ct.sentiment(text=text, File "D:\Miniconda3\envs\py38\lib\site-packages\cntext\stats.py", line 159, in sentiment jieba.add_word(w) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba__init__.py", line 426, in add_word word = strdecode(word) File "D:\Miniconda3\envs\py38\lib\site-packages\jieba_compat.py", line 79, in strdecode sentence = sentence.decode('utf-8') AttributeError: 'int' object has no attribute 'decode' 如果不使用DUTIR词典,使用其他词典,则可以正常运行,如: import cntext as ct text = '我今天得奖了,很高兴,我要将快乐分享大家。' ct.sentiment(text=text, diction=ct.load_pkl_dict('HOWNET.pkl')['HOWNET'], lang='chinese') 运行结果 {'deny_num': 0, 'ish_num': 0, 'more_num': 0, 'neg_num': 0, 'pos_num': 3, 'very_num': 1, 'stopword_num': 8, 'word_num': 14, 'sentence_num': 1} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>
我又新建了一个python3.7.9的环境,运行相同的代码,还是同样的报错。
更新至于1.7.5
pip3 install cntext==1.7.6
更新至于1.7.5
pip3 install cntext==1.7.6
成功解决。非常感谢!
运行代码
报错
如果不使用
DUTIR
词典,使用其他词典,则可以正常运行,如:运行结果