Previously, we only used the fallback configuration if there was no tokenizer_config.json in the model repo. These files are now being added to some repos in the context of removing dependencies with transformers' internals, like this PR:
https://github.com/huggingface/transformers/pull/29112. But only keys removed from the hardcoded rules are being added to minimize potential breaking changes.
We now use the fallback config if tokenizer_config.json exists, no tokenizer class is specified, and we do have a fallback config for this architecture.
Fixes distilgpt2 tokenization.
Previously, we only used the fallback configuration if there was no
tokenizer_config.json
in the model repo. These files are now being added to some repos in the context of removing dependencies with transformers' internals, like this PR: https://github.com/huggingface/transformers/pull/29112. But only keys removed from the hardcoded rules are being added to minimize potential breaking changes.We now use the fallback config if tokenizer_config.json exists, no tokenizer class is specified, and we do have a fallback config for this architecture.