huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.29k stars 26.35k forks source link

AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") #5653

Closed Single430 closed 4 years ago

Single430 commented 4 years ago

❓ Questions & Help

Details

> from transformers import AutoTokenizer, AutoModelWithLMHead
> tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")

I0710 17:52:53.548153 139925919450880 tokenization_utils_base.py:1167] Model name 'hfl/chinese-roberta-wwm-ext' not found in model shortcut name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). Assuming 'hfl/chinese-roberta-wwm-ext' is a path, a model identifier, or url to a directory containing tokenizer files.
I0710 17:52:59.942922 139925919450880 tokenization_utils_base.py:1254] loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/vocab.json from cache at None
I0710 17:52:59.943219 139925919450880 tokenization_utils_base.py:1254] loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/merges.txt from cache at None
I0710 17:52:59.943420 139925919450880 tokenization_utils_base.py:1254] loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/added_tokens.json from cache at /home/ubuntu/.cache/torch/transformers/23740a16768d945f44a24590dc8f5e572773b1b2868c5e58f7ff4fae2a721c49.3889713104075cfee9e96090bcdd0dc753733b3db9da20d1dd8b2cd1030536a2
I0710 17:52:59.943602 139925919450880 tokenization_utils_base.py:1254] loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/special_tokens_map.json from cache at /home/ubuntu/.cache/torch/transformers/6f13f9fe28f96dd7be36b84708332115ef90b3b310918502c13a8f719a225de2.275045728fbf41c11d3dae08b8742c054377e18d92cc7b72b6351152a99b64e4
I0710 17:52:59.943761 139925919450880 tokenization_utils_base.py:1254] loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/tokenizer_config.json from cache at /home/ubuntu/.cache/torch/transformers/5bb5761fdb6c8f42bf7705c27c48cffd8b40afa8278fa035bc81bf288f108af9.1ade4e0ac224a06d83f2cb9821a6656b6b59974d6552e8c728f2657e4ba445d9
I0710 17:52:59.943786 139925919450880 tokenization_utils_base.py:1254] loading file https://s3.amazonaws.com/models.huggingface.co/bert/hfl/chinese-roberta-wwm-ext/tokenizer.json from cache at None

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/transformers/tokenization_auto.py", line 217, in from_pretrained
    return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1140, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1288, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/transformers/tokenization_roberta.py", line 171, in __init__
    **kwargs,
  File "/home/ubuntu/anaconda3/envs/deeplearning/lib/python3.6/site-packages/transformers/tokenization_gpt2.py", line 167, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Does it support hfl/chinese-roberta-wwm-ext now? Or what should i do. Hope for help, thx! @julien-c

A link to original question on Stack Overflow:

QixinLi commented 4 years ago

I also got the same issue. Maybe you can try BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") It works for me.

Single430 commented 4 years ago

I also got the same issue. Maybe you can try BertTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") It works for me.

Yes!! I succeeded, thank you very much for your help!