Open zengxiangqi1031 opened 1 month ago
Hey @zengxiangqi1031! I'm here to help you with any issues or questions you have. Whether it's bugs, questions, or becoming a contributor, I'm here to assist you while you wait for a human maintainer.
The issue you're encountering is due to the incorrect calculation of prompt tokens for image inputs. Currently, Dify does not have an implemented method for calculating tokens for image inputs, which leads to the problem you're seeing where the photo's bytes are being used instead.
This is a known limitation in Dify. The token calculation for image inputs is not yet implemented and would require downloading the image and then getting the resolution for calculation, which would increase the request delay [1].
To address this, you might need to implement a custom token calculation method for image inputs or adjust the existing logic to handle images appropriately. This would involve modifying the token calculation logic to account for the specific characteristics of image data rather than treating it as text.
If you need further assistance or have more questions, feel free to ask!
To continue talking to Dosu, mention @dosu.
fixed?
Yes, the issue with prompt tokens for image inputs being incorrectly calculated based on the photo's bytes instead of the actual token count has been fixed in Dify version 0.6.12-fix1. The code now includes a TODO comment indicating that the current token calculation method for the image type is not implemented and mentions the need to download the image and get the resolution for calculation, which will increase the request delay [1].
To continue talking to Dosu, mention @dosu.
up
Self Checks
Dify version
0.6.12-fix1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
when i use claude3.5-sonnet(support 200k) to tag a photo about 1.5MB as agent , dify said prompt tokens are 1500000+ tokens much bigger than 200k; when i debug api module, i found than prompt tokens are caculated by the photo‘s bytes
✔️ Expected Behavior
prompt tokens should be calculated correctly.
❌ Actual Behavior
No response