langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
43.03k stars 5.97k forks source link

【bug】prompt is too long #5988

Open zengxiangqi1031 opened 1 month ago

zengxiangqi1031 commented 1 month ago

Self Checks

Dify version

0.6.12-fix1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

when i use claude3.5-sonnet(support 200k) to tag a photo about 1.5MB as agent , dify said prompt tokens are 1500000+ tokens much bigger than 200k; when i debug api module, i found than prompt tokens are caculated by the photo‘s bytes

✔️ Expected Behavior

prompt tokens should be calculated correctly.

❌ Actual Behavior

No response

dosubot[bot] commented 1 month ago

Hey @zengxiangqi1031! I'm here to help you with any issues or questions you have. Whether it's bugs, questions, or becoming a contributor, I'm here to assist you while you wait for a human maintainer.

The issue you're encountering is due to the incorrect calculation of prompt tokens for image inputs. Currently, Dify does not have an implemented method for calculating tokens for image inputs, which leads to the problem you're seeing where the photo's bytes are being used instead.

This is a known limitation in Dify. The token calculation for image inputs is not yet implemented and would require downloading the image and then getting the resolution for calculation, which would increase the request delay [1].

To address this, you might need to implement a custom token calculation method for image inputs or adjust the existing logic to handle images appropriately. This would involve modifying the token calculation logic to account for the specific characteristics of image data rather than treating it as text.

If you need further assistance or have more questions, feel free to ask!

To continue talking to Dosu, mention @dosu.

zengxiangqi1031 commented 1 month ago

fixed?

dosubot[bot] commented 1 month ago

Yes, the issue with prompt tokens for image inputs being incorrectly calculated based on the photo's bytes instead of the actual token count has been fixed in Dify version 0.6.12-fix1. The code now includes a TODO comment indicating that the current token calculation method for the image type is not implemented and mentions the need to download the image and get the resolution for calculation, which will increase the request delay [1].

To continue talking to Dosu, mention @dosu.

zengxiangqi1031 commented 1 month ago

up