shibing624 / text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
https://pypi.org/project/text2vec/
Apache License 2.0
4.39k stars 392 forks source link

AutoTokenizer保存ernie3.0失败 #110

Closed yuankunW closed 1 year ago

yuankunW commented 1 year ago

在对text2vec-base-chinese-paraphrase进行训练时,模型训练->保存分词器失败,发现似乎只能用BertTokenizer.from_pretrained()才能实现保存。请问这是ernie3.0的版本在AutoTokenizer中不支持吗?

shibing624 commented 1 year ago

ernie 就是用的BertTokenizer