Discrepancy in promptTokens count while using jtokkit with OpenAI's GPT-3 API

czqclm commented 1 year ago

Hello, I have a question regarding your jtokkit project. While using the countTokens method to calculate GPT_3_5_TURBO tokens, completionTokens are always accurate, but promptTokens differ by 21 (approximately, and consistently across three requests) from the token count displayed on OpenAI's dashboard. I suspect that OpenAI compensates for some prompts before calling the API. Could you please provide an explanation?

Java version: 8 jtokkit version: 0.2.0

public class CountTokenUtils {
    private final static EncodingRegistry registry = Encodings.newDefaultEncodingRegistry();

    public static OpenAIUsageDTO countTokensByRequestAndResponse(ChatCompletionRequest request, ChatCompletionResponse response) {
        Encoding secondEnc = registry.getEncodingForModel(ModelType.GPT_3_5_TURBO);
        int promptTokens = 0;
        int completionTokens = 0;

        // promptTokens
        if (Objects.nonNull(request)) {
            secondEnc = registry.getEncodingForModel(ModelType.fromName(request.getModel()).orElse(ModelType.GPT_3_5_TURBO));

            if (CollectionUtils.isNotEmpty(request.getMessages())) {
                for (MessageDTO message : request.getMessages()) {
                    promptTokens += secondEnc.countTokensOrdinary(message.getContent());
                }
            }
        }

        // completionTokens
        if (Objects.nonNull(response)) {
            if (CollectionUtils.isNotEmpty(response.getChoices())) {
                for (ChatCompletionResponse.Choice choice : response.getChoices()) {
                    if (Objects.nonNull(choice.getMessage()) && StringUtils.isNotBlank(choice.getMessage().getContent())) {
                        completionTokens += secondEnc.countTokensOrdinary(choice.getMessage().getContent());
                    }
                }
            }
        }

        OpenAIUsageDTO usageDTO = OpenAIUsageDTO.builder().promptTokens(promptTokens).completionTokens(completionTokens).totalTokens(promptTokens + completionTokens).build();

        Optional.ofNullable(response).ifPresent(e -> e.setUsage(usageDTO));
        return usageDTO;
    }

}

tox-p commented 1 year ago

Yes, when using ChatML you need to account for the control tokens that are inserted by OpenAI between the ChatMessages. There is an explanation in this cookbook by OpenAI: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

With JTokkit you can count ChatML Prompt Tokens like this:

private final String model;

[...]

private int countMessageTokens(
        final Encoding encoding,
        final List<ChatMessage> messages // consists of role, content and an optional name
) {
    int tokensPerMessage;
    int tokensPerName
    if (model.startsWith("gpt-4")) {
        tokensPerMessage = 3;
        tokensPerName = 1;
    } else if (model.startsWith("gpt-3.5-turbo")) {
        tokensPerMessage = 4; // every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokensPerName = -1; // if there's a name, the role is omitted
    } else {
        tokensPerMessage = 4; // default to gpt-3.5-turbo-0301 if we can not determine the model
        tokensPerName = -1;
    }

    int sum = 0;
    for (final var message : messages) {
        sum += tokensPerMessage;
        sum += encoding.countTokens(message.getContent());
        sum += encoding.countTokens(message.getRole());
        if (message.hasName()) {
            sum += encoding.countTokens(message.getName());
            sum += tokensPerName;
        }
    }

    sum += 3; // every reply is primed with <|start|>assistant<|message|>

    return sum;
}

This method is out of scope for JTokkit, but I am currently working on a larger (soon to be open-sourced) project, that brings model-agnostic building blocks for LLM applications to the JVM ecosystem. It will be included in that project.

czqclm commented 1 year ago

Thank you for your answer, wishing you a happy life.

knuddelsgmbh / jtokkit

Discrepancy in promptTokens count while using jtokkit with OpenAI's GPT-3 API #5