ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
https://ieeexplore.ieee.org/document/9599397
Apache License 2.0
9.56k stars 1.38k forks source link

无法使用transformers快速加载模型 #105

Closed ParadoxZW closed 4 years ago

ParadoxZW commented 4 years ago

报错信息如下: OSError: Model name 'hfl/chinese-roberta-wwm-ext' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1). We assumed 'hfl/chinese-roberta-wwm-ext' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

ymcui commented 4 years ago

你好, 使用transformers==2.8.0以及torch==1.4.0下测试是没有问题的。

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext-large")
model = BertModel.from_pretrained("hfl/chinese-roberta-wwm-ext-large")
ParadoxZW commented 4 years ago

是我服务器网络的问题

fathouse commented 4 years ago

你好, 使用transformers==2.8.0以及torch==1.4.0下测试是没有问题的。

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext-large")
model = BertModel.from_pretrained("hfl/chinese-roberta-wwm-ext-large")

您好,我使用

from transformers import BertModel, BertTokenizer bert = BertModel.from_pretrained("hfl/chinese-roberta-wwm-ext") bert_tokenizer = BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")

transformers==2.9.1 torch==1.4

却出现这样的结果

loading weights file https://cdn.huggingface.co/hfl/chinese-roberta-wwm-ext/pytorch_model.bin from cache at C:\Users\bcc/.cache\torch\transformers\47d2326d47246cef3121d70d592c0391a4ed594b04ce3dea8bd47edd37e20370.6ac27309c356295f0e005c6029fce503ec6a32853911ebf79f8bddd8dd10edad Model name 'hfl/chinese-roberta-wwm-ext' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). Assuming 'hfl/chinese-roberta-wwm-ext' is a path, a model identifier, or url to a directory containing tokenizer files. loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/vocab.txt from cache at C:\Users\bcc/.cache\torch\transformers\5593eb652e3fb9a17042385245a61389ce6f0c8a25e167519477d7efbdf2459a.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00 loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/added_tokens.json from cache at C:\Users\bcc/.cache\torch\transformers\23740a16768d945f44a24590dc8f5e572773b1b2868c5e58f7ff4fae2a721c49.3889713104075cfee9e96090bcdd0dc753733b3db9da20d1dd8b2cd1030536a2 loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/special_tokens_map.json from cache at C:\Users\bcc/.cache\torch\transformers\6f13f9fe28f96dd7be36b84708332115ef90b3b310918502c13a8f719a225de2.275045728fbf41c11d3dae08b8742c054377e18d92cc7b72b6351152a99b64e4 loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/tokenizer_config.json from cache at C:\Users\bcc/.cache\torch\transformers\5bb5761fdb6c8f42bf7705c27c48cffd8b40afa8278fa035bc81bf288f108af9.1ade4e0ac224a06d83f2cb9821a6656b6b59974d6552e8c728f2657e4ba445d9 Traceback (most recent call last): File "D:/pycharm/CW-BERT/CW-ABSA-master/train.py", line 290, in single_train() File "D:/pycharm/CW-BERT/CW-ABSA-master/train.py", line 284, in single_train ins = Instructor(opt) File "D:/pycharm/CW-BERT/CW-ABSA-master/train.py", line 33, in init tokenizer = Tokenizer4Bert(self.bert_tokenizer, opt.max_seq_len) File "D:\pycharm\CW-BERT\CW-ABSA-master\data_utils.py", line 145, in init self.tokenizer = BertTokenizer.from_pretrained(pretrained_bert_name) File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\tokenization_utils.py", line 902, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\tokenization_utils.py", line 933, in _from_pretrained if os.path.isfile(pretrained_model_name_or_path) or is_remote_url(pretrained_model_name_or_path): File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\file_utils.py", line 135, in is_remote_url parsed = urlparse(url_or_filename) File "D:\anaconda\envs\tf-gpu\lib\urllib\parse.py", line 367, in urlparse url, scheme, _coerce_result = _coerce_args(url, scheme) File "D:\anaconda\envs\tf-gpu\lib\urllib\parse.py", line 123, in _coerce_args return _decode_args(args) + (_encode_result,) File "D:\anaconda\envs\tf-gpu\lib\urllib\parse.py", line 107, in _decode_args return tuple(x.decode(encoding, errors) if x else '' for x in args) File "D:\anaconda\envs\tf-gpu\lib\urllib\parse.py", line 107, in return tuple(x.decode(encoding, errors) if x else '' for x in args) AttributeError: 'int' object has no attribute 'decode'

Model name '80' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). Assuming '80' is a path, a model identifier, or url to a directory containing tokenizer files.

只有model无法加载,我在线明明已经下载下来,可结果却是在 model shortcut name list 中找不到,请问您知道原因吗,

fathouse commented 4 years ago

并且,我使用 from transformers import AutoTokenizer, AutoModelWithLMHead

bert = AutoModelWithLMHead.from_pretrained("hfl/chinese-roberta-wwm-ext") bert_tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")

却会出现这样的情况

odel name 'hfl/chinese-roberta-wwm-ext' not found in model shortcut name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). Assuming 'hfl/chinese-roberta-wwm-ext' is a path, a model identifier, or url to a directory containing tokenizer files. Traceback (most recent call last): File "D:/pycharm/CW-BERT/CW-ABSA-master/train.py", line 290, in single_train() File "D:/pycharm/CW-BERT/CW-ABSA-master/train.py", line 284, in single_train ins = Instructor(opt) File "D:/pycharm/CW-BERT/CW-ABSA-master/train.py", line 32, in init bert_tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\tokenization_auto.py", line 203, in from_pretrained return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, kwargs) File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\tokenization_utils.py", line 902, in from_pretrained return cls._from_pretrained(*inputs, *kwargs) File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\tokenization_utils.py", line 1055, in _from_pretrained tokenizer = cls(init_inputs, init_kwargs) File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\tokenization_roberta.py", line 151, in init **kwargs, File "D:\anaconda\envs\tf-gpu\lib\site-packages\transformers\tokenization_gpt2.py", line 151, in init with open(vocab_file, encoding="utf-8") as vocab_handle: TypeError: expected str, bytes or os.PathLike object, not NoneType loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/vocab.json from cache at None loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/merges.txt from cache at None loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/added_tokens.json from cache at C:\Users\bcc/.cache\torch\transformers\23740a16768d945f44a24590dc8f5e572773b1b2868c5e58f7ff4fae2a721c49.3889713104075cfee9e96090bcdd0dc753733b3db9da20d1dd8b2cd1030536a2 loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/special_tokens_map.json from cache at C:\Users\bcc/.cache\torch\transformers\6f13f9fe28f96dd7be36b84708332115ef90b3b310918502c13a8f719a225de2.275045728fbf41c11d3dae08b8742c054377e18d92cc7b72b6351152a99b64e4 loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/tokenizer_config.json from cache at C:\Users\bcc/.cache\torch\transformers\5bb5761fdb6c8f42bf7705c27c48cffd8b40afa8278fa035bc81bf288f108af9.1ade4e0ac224a06d83f2cb9821a6656b6b59974d6552e8c728f2657e4ba445d9

Process finished with exit code 1

menggesun1997 commented 1 year ago

我也遇到了这个问题