Closed memray closed 4 years ago
same issue
Hello! I think this is due to a mismatch between your transformers
and tokenizers
versions. transformers
version v3.3.1 expects tokenizers == 0.8.1.rc2
.
If you want to use tokenizers == 0.9.2
you should work on the current master
branch or wait for version v3.4.0 which should be released sometimes today.
Thank you! I upgraded both and it works.
Environment info
transformers
version: 3.3.1tokenizers
version: 0.9.2Who can help @mfuntowicz
Information
Model I am using (Bert, XLNet ...): RoBERTa-base
The problem arises when using:
The tasks I am working on is:
To reproduce
I use the RobertaTokenizerFast and it seems an arg name mismatch. Steps to reproduce the behavior:
In transformers.tokenization_gpt2.py L376 it is:
ByteLevelBPETokenizer( vocab_file=vocab_file, merges_file=merges_file, add_prefix_space=add_prefix_space, trim_offsets=trim_offsets, )
But in tokenizers.implementations.ByteLevelBPETokenizer it is expected to be
vocab
.Expected behavior
File "/zfs1/hdaqing/rum20/kp/fairseq-kpg/fairseq/data/encoders/hf_bpe.py", line 31, in __init__ self.tokenizer = RobertaTokenizerFast.from_pretrained(args.pretrained_model, cache_dir=args.cache_dir) File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1428, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1575, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_roberta.py", line 380, in __init__ **kwargs, File "/ihome/hdaqing/rum20/anaconda3/envs/kp/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 380, in __init__ trim_offsets=trim_offsets, TypeError: __init__() got an unexpected keyword argument 'vocab_file'