utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.85k stars 342 forks source link

New issue with bert tokenizer #212

Open aubluce opened 4 years ago

aubluce commented 4 years ago

TypeError Traceback (most recent call last)

in () 10 multi_gpu=False, 11 multi_label=True, ---> 12 model_type='bert') 4 frames /usr/local/lib/python3.6/dist-packages/fast_bert/data_cls.py in __init__(self, data_dir, label_dir, tokenizer, train_file, val_file, test_data, label_file, text_col, label_col, batch_size_per_gpu, max_seq_length, multi_gpu, multi_label, backend, model_type, logger, clear_cache, no_cache) 365 if isinstance(tokenizer, str): 366 # instantiate the new tokeniser object using the tokeniser name --> 367 tokenizer = AutoTokenizer.from_pretrained(tokenizer, use_fast=True) 368 369 self.tokenizer = tokenizer /usr/local/lib/python3.6/dist-packages/transformers/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 193 if isinstance(config, config_class): 194 if tokenizer_class_fast and use_fast: --> 195 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) 196 else: 197 return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) /usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in from_pretrained(cls, *inputs, **kwargs) 391 392 """ --> 393 return cls._from_pretrained(*inputs, **kwargs) 394 395 @classmethod /usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs) 542 # Instantiate tokenizer. 543 try: --> 544 tokenizer = cls(*init_inputs, **init_kwargs) 545 except OSError: 546 raise OSError( /usr/local/lib/python3.6/dist-packages/transformers/tokenization_bert.py in __init__(self, vocab_file, do_lower_case, do_basic_tokenize, never_split, unk_token, sep_token, pad_token, cls_token, mask_token, clean_text, tokenize_chinese_chars, add_special_tokens, strip_accents, wordpieces_prefix, **kwargs) 618 strip_accents=strip_accents, 619 lowercase=do_lower_case, --> 620 wordpieces_prefix=wordpieces_prefix, 621 ), 622 unk_token=unk_token, TypeError: __init__() got an unexpected keyword argument 'add_special_tokens'
aaronbriel commented 4 years ago

This is not enough information. Kindly provide steps to reproduce the issue.