sgccnlp / ecws

电力领域中文分词模型 R3.0
http://sgccnlp.com
MIT License
24 stars 9 forks source link

demo试用失败 #3

Open jmzhoulab opened 3 years ago

jmzhoulab commented 3 years ago
from ecws.segment import Segmenter

model_path = 'ecws.model'

predict = Segmenter(model_path)

d = predict.seg(sent)

报错如下:

Traceback (most recent call last):
  File "/Users/zhoujm/workspace/python/kbqa4power/test/test.py", line 20, in <module>
    predict = Segmenter(model_path)
TypeError: __init__() missing 1 required positional argument: 'vocab_path'

缺少vocab_path,这是接口变了吗?另外vocab_path的内容格式是怎么样的?

liefficient commented 2 years ago

同样,init() missing 1 required positional argument: 'vocab_path'

campper commented 2 years ago

稍等,我看一下,尽快答复

ctrl-zzzzz commented 2 years ago

@liefficient @jmzhoulab 您好

目前vocab_path需要指向官方BertTokenizer的归档文件。 具体操作如下:

from transformers import BertTokenizer
path = ‘path_to_save’

tokenizer = BertTokenizer.from_pretrained(‘bert-base-chinese’)
tokenizer.save_pretrained(path)

然后在接口调用的时候,将vocab_path指向path

最近会更新一版代码,优化调用结构和在更大的语料上进行训练。