Closed kczimm closed 6 months ago
This file is not meant to be used by the tokenizers
library but only the transformers
library. On top of this, it was deprecated!
You should add tokens to a tokenizer using the add_tokens
Well that's good to know! Do you happen to have a link to the deprecation? I'm interested in learning what is supposed to replace it. I'll close in the meantime since as you say this does not pertain to tokenizers
. Thanks!
The replacement is introduced by https://github.com/huggingface/transformers/pull/23909, the tokenizer_config.json includes the added_tokens_decoder
argument!
Given a
Tokenizer
what is the appropriate way to add tokens from anadded_tokens.json
file of the format:I see the
Tokenizer.add_tokens
method. Should the user just createAddedTokens
from this file? Could we make something likeTokenizer.add_tokens_from_file
?