shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
https://www.mulanai.com/product/corrector/
Apache License 2.0
5.61k stars 1.1k forks source link

请问为何编译的时候会报'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte #522

Closed QiuRoy closed 1 month ago

QiuRoy commented 1 month ago

screenshot-1728760389319

QiuRoy commented 1 month ago

import os from pycorrector import Corrector

pwd_path = os.path.abspath(os.path.dirname(file)) lm_path = os.path.join(pwd_path, './zh_giga.no_cna_cmn.prune01244.klm') m = Corrector(lm_path) print(m.correct_batch(['少先队员因该为老人让坐', '你找到你最喜欢的工作,我也很高心。']))

shibing624 commented 1 month ago

重跑试试

QiuRoy commented 1 month ago

不行,还是同样的报错

shibing624 commented 1 month ago

用macbert4csc模型

QiuRoy commented 1 month ago

已经可以了: pwd_path = os.path.abspath(os.path.dirname(file)) lm_path = os.path.join(pwd_path, 'zh_giga.no_cna_cmn.prune01244.klm') model = Corrector(language_model_path=lm_path)参数传递问题引起的

QiuRoy commented 1 month ago

请问使用ErnieCscCorrector是无法自定义模型路径吗? from pycorrector import ErnieCscCorrector import os

pwd_path = os.path.abspath(os.path.dirname(file)) lm_path = os.path.join(pwd_path, 'csc-ernie-1.0.pdparams') print(lm_path) m = ErnieCscCorrector(model_name_or_path=lm_path) print(m.correct_batch(['今天新情很好', '你找到你最喜欢的工作,我也很高心。'])) screenshot-1729002145784

shibing624 commented 1 month ago

无法自定义模型路径