Closed angeloskath closed 6 months ago
Previously the following code would fail to tokenize the 2nd string even though ignore_unk is set to True.
ignore_unk
import mlx.data.core from mlx.data.core import CharTrie if __name__ == "__main__": vocab = CharTrie() vocab.insert("hello") vocab.insert(" world") tokenizer = mlx.data.core.Tokenizer(vocab, ignore_unk=True) print(tokenizer.tokenize_shortest("hello world")) print(tokenizer.tokenize_shortest("hello "))
Previously the following code would fail to tokenize the 2nd string even though
ignore_unk
is set to True.