ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
https://ieeexplore.ieee.org/document/9599397
Apache License 2.0
9.66k stars 1.39k forks source link

pad_token_id错误 #214

Closed CaoYiwei closed 2 years ago

CaoYiwei commented 2 years ago

您好,huggingface上chiniese-roberta-wwm-ext模型的config.json的pad_token_id是1,应该是0

CaoYiwei commented 2 years ago

额,不止这一个模型的pad_token_id有问题,麻烦您改一下

ymcui commented 2 years ago

你好,感谢告知。 已修改chinese-roberta-wwm-extchinese-roberta-wwm-ext-large

CaoYiwei commented 2 years ago

thx~

ustcdane commented 2 years ago

hi, @ymcui "今天[MASK]情很好" 发现huggingfacechinese-roberta-wwm-ext-large 还有 github chinese_roberta_wwm_large_ext_pytorch.zip
给出的结果比较奇怪:篝、颼

ustcdane commented 2 years ago

好像是 这个原因: https://github.com/ymcui/Chinese-BERT-wwm/issues/76