langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations
https://smith.langchain.com/
MIT License
352 stars 59 forks source link

Issue: Incorrect multimodal token logging #837

Open jtg21 opened 1 week ago

jtg21 commented 1 week ago

Issue you'd like to raise.

When using traceable or wrappers.wrap_openai with a multimodal gpt-4o call the number input tokens seem to be incorrectly tracked.

This is the code I used to test:

test_image_path = "path_to_img/space.jpg"
test_image_url = "https://upload.wikimedia.org/wikipedia/commons/e/e2/The_Algebra_of_Mohammed_Ben_Musa_-_page_82b.png"

test_image_bytes = encode_image(test_image_path)

model = "gpt-4o"
b64_messages = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe this image"},
        image_message_b64(test_image_bytes)
    ]
}]

url_messages = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe this image"},
        image_message_url(test_image_url)
    ]
}]

wrapped_base_client = wrappers.wrap_openai(OpenAI())

def wrapped_completion(messages):
    response = wrapped_base_client.chat.completions.create(
        model=model,
        messages=messages
    )
    print(f"Response: {response}")
    return "hello"

# With b64 enc image
wrapped_completion(b64_messages)

# With url image
wrapped_completion(url_messages)

When I check my Langsmith project, I see two traces. The one associated with the b64 encoded image reports ~0.5 million tokens inputted, and the URL method reports under 200 inputted tokens. After looking at the wrapper code, it appears that the text of the messages is just being combined and tokenized in order to get the input tokens. But this doesn't work for multimodal calls since the tokenization method isn't known. It appears the only accurate way to get the input token usage is to wait for the usage report from OpenAI.

I have implemented my own workaround using the REST API, but native support for accurate multimodal token tracking would be more helpful.

Suggestion:

No response

ujjwalm29 commented 1 week ago

I second this. My image + text prompt in langsmith shows as 800K tokens while the openAI API "usage" field mentions the total_tokens as 2160. Please fix. I am using wrap_openai() with @traceable