langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.31k stars 6.36k forks source link

token count is incorrect in vision mode #7707

Open yuxizhe opened 2 weeks ago

yuxizhe commented 2 weeks ago

Self Checks

Dify version

0.7.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When dealing with vision tasks, dify reports significantly fewer token consumptions than actually used. use OpenAI 4o-mini model

iShot2024-08-27 16 40 55

In OpenAI 4o-mini , image processing consumes at least 2833 tokens. image

✔️ Expected Behavior

Display the correct token consumption.

❌ Actual Behavior

token count is incorrect in vision mode, too low

dosubot[bot] commented 2 weeks ago

The token count is incorrect and significantly lower than expected in vision mode when using the OpenAI 4o-mini model in Dify version 0.7.0 because the current token calculation method for the image type is not implemented. The code comments indicate that the method to calculate tokens for image types requires downloading the image and then getting the resolution for calculation, which is not currently done and would increase the request delay [1].

To continue talking to Dosu, mention @dosu.