Open Tacacs-1101 opened 2 years ago
I also got the same issue. Anyone resolved this?
Code breaks using a different model other than BERT. I debugged into the code and found that the code is written with respect to BERT tokenizer only while the tokenizers of other transformer models are different. Below snippet in helpers.py
if BERT_TOKENIZER is None: # gets initialized during the first call to this method if bert_pretrained_name_or_path: BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained(bert_pretrained_name_or_path) BERT_TOKENIZER.do_basic_tokenize = True BERT_TOKENIZER.tokenize_chinese_chars = False else: BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained('bert-base-cased') BERT_TOKENIZER.do_basic_tokenize = True BERT_TOKENIZER.tokenize_chinese_chars = False
Faced with the same issue. This is a quick fix:
BERT_TOKENIZER = transformers.BertTokenizer.from_pretrained(bert_pretrained_name_or_path)
Replace BertTokenizer with XLMRobertaTokenizer:
BERT_TOKENIZER = transformers.XLMRobertaTokenizer.from_pretrained(bert_pretrained_name_or_path)
Code breaks using a different model other than BERT. I debugged into the code and found that the code is written with respect to BERT tokenizer only while the tokenizers of other transformer models are different. Below snippet in helpers.py