Closed delgermurun closed 1 year ago
Perfect, closing this for now.
Once the awesome model you're building get's merged in transformers
we'll merge #909 to get it included !
from tokenizers.pre_tokenizers import Whitespace
tokenizer = Tokenizer(BPE())
tokenizer.pre_tokenizer = Whitespace()
it works for me~
Here is the reproducible script:
Works fine if I use trained tokenizer directly (not loading from the file)
Output:
['es', 'p', 'ec', 'i', 'all', 'y ', ' ', ' ', ' ', ' ', ' ', ' a', 'gainst ', 'Caius Marc', 'i', 'us', '?\n\nAll:\n', 'A', 'gain', 'st']
But loading the tokenizer from the file fails.
Version:
tokenizers==0.13.3