When I check my Langsmith project, I see two traces. The one associated with the b64 encoded image reports ~0.5 million tokens inputted, and the URL method reports under 200 inputted tokens. After looking at the wrapper code, it appears that the text of the messages is just being combined and tokenized in order to get the input tokens. But this doesn't work for multimodal calls since the tokenization method isn't known. It appears the only accurate way to get the input token usage is to wait for the usage report from OpenAI.
I have implemented my own workaround using the REST API, but native support for accurate multimodal token tracking would be more helpful.
I second this. My image + text prompt in langsmith shows as 800K tokens while the openAI API "usage" field mentions the total_tokens as 2160. Please fix. I am using wrap_openai() with @traceable
Issue you'd like to raise.
When using traceable or wrappers.wrap_openai with a multimodal gpt-4o call the number input tokens seem to be incorrectly tracked.
This is the code I used to test:
When I check my Langsmith project, I see two traces. The one associated with the b64 encoded image reports ~0.5 million tokens inputted, and the URL method reports under 200 inputted tokens. After looking at the wrapper code, it appears that the text of the messages is just being combined and tokenized in order to get the input tokens. But this doesn't work for multimodal calls since the tokenization method isn't known. It appears the only accurate way to get the input token usage is to wait for the usage report from OpenAI.
I have implemented my own workaround using the REST API, but native support for accurate multimodal token tracking would be more helpful.
Suggestion:
No response