Closed hawgjmrd72 closed 1 year ago
Thanks for your interest in tiktoken! The API for the Encoding class is documented over here:
Yes, I already read it, but I still don't know how to make the parameters such as pat_str and mergeable_ranks. I was wonder if bigscience/bloom share the same tokenizer with GPT2 since there are both based on byte-level Byte-Pair-Encoding.
I'd look at the code they've shared to figure out what arguments to pass. I've only worked directly on OpenAI's models — so it's not like I have any special knowledge about bigscience/bloom ;-)
Thanks for your interest in tiktoken! The API for the Encoding class is documented over here: https://github.com/openai/tiktoken/blob/ec7c121e385bf1675312c6c33734de6b392890c4/tiktoken/core.py#L26