Open alvations opened 1 week ago
Hey! This is not because you can't change it, but because the v3 does not have a normalizer at all.
This is the "legacy=False"
version of the tokenizer. This should be fixed soon btw, the mistralv01 should end up without a normalizer
System Info
transformers==4.41.2
Who can help?
@ArthurZucker
Reproduction
From https://github.com/huggingface/tokenizers/issues/1552#issue-2348487489
[out]:
The same process above won't work for
"mistralai/Mistral-7B-v0.3"
.But if we reinitialize with
__class__
after the.from_pretrained
it loads the tokenizer config correctly with the extended normalizer. https://stackoverflow.com/questions/78612251/how-do-we-add-modify-the-normalizer-in-a-pretrained-huggingface-tokenizer/78624238#78624238Expected behavior
The same
.from_pretrained
should work for other model's tokenizers after changes to the normalizer.