Open IsNoobgrammer opened 1 month ago
Hey, considering its superiority over SPE tokenizers
would you provide some sample/example code to train a tiktoken tokenizer from scratch on a custom dataset
also like training BPE/SPE does it support min_frequency and min_length for tokens while training ?
Hey, considering its superiority over SPE tokenizers
would you provide some sample/example code to train a tiktoken tokenizer from scratch on a custom dataset
also like training BPE/SPE does it support min_frequency and min_length for tokens while training ?