utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.86k stars 341 forks source link

[BUG] AttributeError: 'RobertaTokenizer' object has no attribute 'max_len' #309

Open FirstGalacticEmpire opened 2 years ago

FirstGalacticEmpire commented 2 years ago

args = Box({ "seed": 42, "task_name": 'Medical_language_modelling', "model_name": 'roberta-base', "model_type": 'roberta', "train_batch_size": 16, "learning_rate": 4e-5, "num_train_epochs": 20, "fp16": True, "fp16_opt_level": "O2", "warmup_steps": 1000, "logging_steps": 0, "max_seq_length": 512, "multi_gpu": True if torch.cuda.device_count() > 1 else False })

databunch_lm = BertLMDataBunch.from_raw_corpus( data_dir=Path("./raw_text/"), text_list=list_of_files, tokenizer=args.model_name, batch_size_per_gpu=args.train_batch_size, max_seq_length=args.max_seq_length, multi_gpu=args.multi_gpu, model_type=args.model_type, logger=logger)

When running the following line I get the following error: "AttributeError: 'RobertaTokenizer' object has no attribute 'max_len'" Which I suspect is due to update, that caused the RobertaTokenizer to lost its attribute max_len.