Open yuxizhe opened 2 weeks ago
The token count is incorrect and significantly lower than expected in vision mode when using the OpenAI 4o-mini model in Dify version 0.7.0 because the current token calculation method for the image type is not implemented. The code comments indicate that the method to calculate tokens for image types requires downloading the image and then getting the resolution for calculation, which is not currently done and would increase the request delay [1].
To continue talking to Dosu, mention @dosu.
Self Checks
Dify version
0.7.0
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
When dealing with vision tasks, dify reports significantly fewer token consumptions than actually used. use OpenAI 4o-mini model
In OpenAI 4o-mini , image processing consumes at least 2833 tokens.
✔️ Expected Behavior
Display the correct token consumption.
❌ Actual Behavior
token count is incorrect in vision mode, too low