Open computabeast opened 4 months ago
It would be nice to tokenize straight from a .jsonl file.
tokenizer = MistralTokenizer.from_model("open-mixtral-8x22b") tokenized = tokenizer.from_jsonl("my_file.jsonl") tokens, text = tokenized.tokens, tokenized.text ...
It would be nice to tokenize straight from a .jsonl file.