Closed erickrf closed 4 years ago
Hi. This is not a bug but is expected: since the model works on the character level, a tokenizer is not "required". You can read more in the model card on how you can encode/decode your data.
@erickrf can you share how you got to train the "reformer" model. I´m trying to utilize the "google/reformer-enwik8" to train a Portuguese model but I just got the same error of
Model name 'google/reformer-enwik8' was not found in tokenizers
@bratao I answered this in my comment... Open the link thzt I posted and scroll down. They tell you how to do tokenisation. No need to load a tokenizer as usual.
@BramVanroy
my code is below
python examples/seq2seq/finetune_trainer.py --model_name_or_path google/reformer-enwik8 --do_train --do_eval --task translation_en_to_de --data_dir /lustre/dataset/wmt17_en_de/ --output_dir /home2/zhenggo1/checkpoint/reformer_translation --per_device_train_batch_size=4 --per_device_eval_batch_size=4 --overwrite_output_dir --predict_with_generate
and the bug is below,so what the reason? thks!
Traceback (most recent call last):
File "examples/seq2seq/finetune_trainer.py", line 367, in <module>
main()
File "examples/seq2seq/finetune_trainer.py", line 206, in main
cache_dir=model_args.cache_dir,
File "/home2/zhenggo1/LowPrecisionInferenceTool/examples/pytorch/huggingface_transformers/src/transformers/models/auto/tokenization_auto.py", line 385, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home2/zhenggo1/LowPrecisionInferenceTool/examples/pytorch/huggingface_transformers/src/transformers/tokenization_utils_base.py", line 1760, in from_pretrained
raise EnvironmentError(msg)
OSError: Can't load tokenizer for 'google/reformer-enwik8'. Make sure that:
- 'google/reformer-enwik8' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'google/reformer-enwik8' is the correct path to a directory containing relevant tokenizer files
@LeopoldACC Please post a new issue so that some one can have a look.
🐛 Bug
Information
Model I am using (Bert, XLNet ...): Reformer tokenizer
To reproduce
Steps to reproduce the behavior:
AutoTokenizer.from_pretrained("google/reformer-enwik8")
This is the error I got:
I tried with and without
google/
, same result. However, it did print the download progress bar. Trying to load thecrime-and-punishment
Reformer tokenizer works.transformers
version: 2.9.0