Usage stats for streaming response.

pors commented 11 months ago

I think a simple solution could be to wait until the last chunk arrived and then calculate the usage with the prompt and response strings as shown here: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

Needs to be extended to include the prompt to get the full usage object.

minimaxir commented 11 months ago

This approach requires adding tiktoken as a dependency, which I would prefer to avoid but I might not have a choice.

It also probably doesn't work at all with structured input data (as OpenAI uses fancy things for that) so it adds a new discrepancy problem.

pors commented 11 months ago

This approach requires adding tiktoken as a dependency, which I would prefer to avoid but I might not have a choice.

Of course, app developers can calculate it themselves like this. So just providing documentation on how to do it might be enough?

It also probably doesn't work at all with structured input data (as OpenAI uses fancy things for that) so it adds a new discrepancy problem.

What do you mean here? The use of OpenAI functions?

minimaxir / simpleaichat

Usage stats for streaming response. #55