Closed HiThere0175 closed 1 year ago
I released version 0.5.0 of tiktoken yesterday. You can see the changelog here https://github.com/openai/tiktoken/blob/main/CHANGELOG.md#v050
Note however, that tiktoken does not and never has had a class named Tokenizer
. Maybe you could be more specific about what issues you're seeing?
Trying to get a token count similar to this individual: https://stackoverflow.com/questions/75804599/openai-api-how-do-i-count-tokens-before-i-send-an-api-request#:~:text=Tiktoken%20is%20a%20fast%20open-source%20tokenizer%20by%20OpenAI.,%5B%22t%22%2C%20%22ik%22%2C%20%22token%22%2C%20%22%20is%22%2C%20%22%20great%22%2C%20%22%21%22%5D%29.
`import sys import openai from tiktoken import Tokenizer
openai.api_key = 'YOUR_API_KEY'
max_token_limit = 4096 # Adjust as per your model's limit
def count_tokens(text): tokenizer = Tokenizer() tokens = tokenizer.count_tokens(text) return tokens
def chunk_text(text, max_chunk_size): chunks = [] while len(text) > max_chunk_size: chunk, text = text[:max_chunk_size], text[max_chunk_size:] chunks.append(chunk) if text: chunks.append(text) return chunks
def check_token_limit(file_path): with open(file_path, 'r', encoding='utf-8') as file: file_content = file.read()
tokens_count = count_tokens(file_content)
if tokens_count > max_token_limit:
print(f"Warning: This text exceeds the maximum token limit of {max_token_limit}.")
print(f"Total tokens in the file: {tokens_count}")
print("Splitting the file into smaller parts:")
chunks = chunk_text(file_content, max_token_limit)
for i, chunk in enumerate(chunks):
chunk_file_name = f"chunk_{i + 1}.txt"
with open(chunk_file_name, 'w', encoding='utf-8') as chunk_file:
chunk_file.write(chunk)
print(f"Chunk {i + 1}: {chunk_file_name} (Tokens: {count_tokens(chunk)})")
print("Consider sending these chunks individually to stay within the token limit.")
else:
print(f"Total tokens in the file: {tokens_count}")
if name == 'main':
if len(sys.argv) != 2:
print("Usage: python token_checker.py
file_path = sys.argv[1]
check_token_limit(file_path)
`
This code literally never worked, from tiktoken import Tokenizer
was never a valid import. You can correct example code in the StackOverflow question you linked. I also recommend taking a look at the recipes in https://github.com/openai/openai-cookbook
Hello everyone,
I've been using the tiktoken tokenizer module for tokenizing text for OpenAI's models. Recently, I encountered some issues with importing specific classes from the module. I wanted to check if there have been any recent updates or changes to the tiktoken tokenizer that I might not be aware of.
Specifically, I've had difficulty with the Tokenizer class. Has anyone else experienced this or knows if there's been a recent change to this part of the module? Any guidance would be greatly appreciated!
Thank you in advance!