huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.8k stars 26.46k forks source link

TypeError: __init__() got an unexpected keyword argument 'vocab_file' in transformers/tokenization_gpt2.py", line 380 #7916

Closed memray closed 3 years ago

memray commented 3 years ago

Environment info

Who can help @mfuntowicz

Information

Model I am using (Bert, XLNet ...): RoBERTa-base

The problem arises when using:

The tasks I am working on is:

To reproduce

I use the RobertaTokenizerFast and it seems an arg name mismatch. Steps to reproduce the behavior:

  1. self.tokenizer = RobertaTokenizerFast.from_pretrained('roberta-base', cache_dir=args.cache_dir)

In transformers.tokenization_gpt2.py L376 it is: ByteLevelBPETokenizer( vocab_file=vocab_file, merges_file=merges_file, add_prefix_space=add_prefix_space, trim_offsets=trim_offsets, )

But in tokenizers.implementations.ByteLevelBPETokenizer it is expected to be vocab.

Expected behavior

File "/zfs1/hdaqing/rum20/kp/fairseq-kpg/fairseq/data/encoders/hf_bpe.py", line 31, in __init__ self.tokenizer = RobertaTokenizerFast.from_pretrained(args.pretrained_model, cache_dir=args.cache_dir) File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1428, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1575, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_roberta.py", line 380, in __init__ **kwargs, File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 380, in __init__ trim_offsets=trim_offsets, TypeError: __init__() got an unexpected keyword argument 'vocab_file'

azamatolegen commented 3 years ago

same issue

LysandreJik commented 3 years ago

Hello! I think this is due to a mismatch between your transformers and tokenizers versions. transformers version v3.3.1 expects tokenizers == 0.8.1.rc2.

If you want to use tokenizers == 0.9.2 you should work on the current master branch or wait for version v3.4.0 which should be released sometimes today.

memray commented 3 years ago

Thank you! I upgraded both and it works.