token count is incorrect in vision mode

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Other

45.31k stars 6.36k forks source link

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.7.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When dealing with vision tasks, dify reports significantly fewer token consumptions than actually used. use OpenAI 4o-mini model

In OpenAI 4o-mini , image processing consumes at least 2833 tokens.

✔️ Expected Behavior

Display the correct token consumption.

❌ Actual Behavior

token count is incorrect in vision mode, too low

langgenius / dify