Closed sm745052 closed 7 months ago
Hey, you should see the following warning:
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
tokenizer = T5Tokenizer.from_pretrained("t5-base", legacy = False)
should be used. also:
- decoded_text = tokenizer.decode(encoded[0])
+ decoded_text = tokenizer.decode(encoded[0],spaces_between_special_tokens = False`)
closing as this is related to transformers
not tokenizers
transformers
version: 4.38.1Hi, I found out that after adding a new token, say, both tokenizers behave differently.
gives
where as
gives