Open tqfang opened 2 years ago
In the current version, when running:
from DeBERTa import deberta vocab_path, vocab_type = deberta.load_vocab(pretrained_id='base') tokenizer = deberta.tokenizers[vocab_type](vocab_path) tokens = tokenizer.tokenize('[MASK]') print(tokens) print(tokenizer.convert_tokens_to_ids(tokenizer.mask())) print(tokenizer.convert_tokens_to_ids("[MASK]")) print(tokenizer.vocab["[MASK]"])
Output:
['▁[', 'MAS', 'K', ']'] [4746, 829, 291, 179, 1015, 552] [4746, 829, 291, 179, 1015, 552] 128000
Neither method can give a correct id of the special token "[MASK]", i.e., 128000.
Is this a bug or am I using the tokenizer in the wrong way? Thanks
mask is generated after tokenize
In the current version, when running:
Output:
Neither method can give a correct id of the special token "[MASK]", i.e., 128000.
Is this a bug or am I using the tokenizer in the wrong way? Thanks