【bug】prompt is too long

zengxiangqi1031 commented 1 month ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.12-fix1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

when i use claude3.5-sonnet(support 200k) to tag a photo about 1.5MB as agent ， dify said prompt tokens are 1500000+ tokens much bigger than 200k; when i debug api module, i found than prompt tokens are caculated by the photo‘s bytes

✔️ Expected Behavior

prompt tokens should be calculated correctly.

❌ Actual Behavior

No response

dosubot[bot] commented 1 month ago

Hey @zengxiangqi1031! I'm here to help you with any issues or questions you have. Whether it's bugs, questions, or becoming a contributor, I'm here to assist you while you wait for a human maintainer.

The issue you're encountering is due to the incorrect calculation of prompt tokens for image inputs. Currently, Dify does not have an implemented method for calculating tokens for image inputs, which leads to the problem you're seeing where the photo's bytes are being used instead.

This is a known limitation in Dify. The token calculation for image inputs is not yet implemented and would require downloading the image and then getting the resolution for calculation, which would increase the request delay ^[1].

To address this, you might need to implement a custom token calculation method for image inputs or adjust the existing logic to handle images appropriately. This would involve modifying the token calculation logic to account for the specific characteristics of image data rather than treating it as text.

If you need further assistance or have more questions, feel free to ask!

To continue talking to Dosu, mention @dosu.

zengxiangqi1031 commented 1 month ago

fixed?

dosubot[bot] commented 1 month ago

Yes, the issue with prompt tokens for image inputs being incorrectly calculated based on the photo's bytes instead of the actual token count has been fixed in Dify version 0.6.12-fix1. The code now includes a TODO comment indicating that the current token calculation method for the image type is not implemented and mentions the need to download the image and get the resolution for calculation, which will increase the request delay ^[1].

To continue talking to Dosu, mention @dosu.

zengxiangqi1031 commented 1 month ago

up

langgenius / dify