[Bug]The bot cannot recognize the image and other bugs

AlanBacker commented 3 months ago

Hi, it looks like the bot generate images.

AlanBacker commented 3 months ago

And I've tested this project on two Linux servers, I got the same results.

hihumanzone commented 3 months ago

Are there any console errors while trying to recognize images? Here is the list of supported MIME types:

image/png
image/jpeg
image/webp
image/heic
image/heif
audio/wav
audio/mp3
audio/aiff
audio/aac
audio/ogg
audio/flac
video/mp4
video/mpeg
video/mov
video/avi
video/x-flv
video/mpg
video/webm
video/wmv
video/3gpp
text/plain
text/html
text/css
text/javascript
application/x-javascript
text/x-typescript
application/x-typescript
text/csv
text/markdown
text/x-python
application/x-python-code
application/json
text/xml
application/rtf
text/rtf

Regarding the second problem, it is being caused by the provider, which is the SD-XL Hugging Face space being overloaded. I have a retry system with + three attempts, because it generally eventually works; it just takes time. But recently, it's not doing that. So, to fix this, you can try changing your model and default model in the config.json to something else.

I recommend DallE-XL, but you can choose any model that you prefer. you can experiment with other models to see what works best for you. I will try my best to find a better provider. Screenshot_2024-07-03-11-14-59-78_e4424258c8b8649f6e67d283a50a2cbc Screenshot_2024-07-03-11-14-30-03_e4424258c8b8649f6e67d283a50a2cbc Screenshot_2024-07-03-11-13-07-75_e4424258c8b8649f6e67d283a50a2cbc

hihumanzone commented 3 months ago

Is this the image recognition issue you were talking about? If so, then it's the Gemini 1.5 flash model acting up. It occasionally says that, but just responding with 'you can read images' in the next message generally works. Screenshot_2024-07-03-11-25-51-68_e4424258c8b8649f6e67d283a50a2cbc

I have an idea to fix this, but I'm not sure how well it will work.

hihumanzone commented 3 months ago

I've made some changes to the code. You can ignore everything above and just expect it to work after updating. Please confirm if it works.

AlanBacker commented 3 months ago

thanks for the changes. after I tried to send an image to the bot it still needed 2-5 mins to respond, kinda weird.

hihumanzone commented 3 months ago

The bot only downloads and uploads the image to Google File API. There might be two reasons for the bot taking time:

The host location has slow internet, or the uploaded image is way too big. If this is the case, the bot does not display the Let me think... message until the image is fully uploaded and ready for Google to process; it just shows the typing indicator.
Google simply doesn't want to process the image, and it's not an issue with the host. If this is the case, the bot will be stuck on the Let me think... message.

hihumanzone commented 3 months ago

And there is also a 3rd case:

the chat conversation is too large for the model to handle and respond to quickly. To fix this, please clear the conversation history.

AlanBacker commented 3 months ago

i see, thanks for the help.

hihumanzone / Gemini-Discord-Bot

[Bug]The bot cannot recognize the image and other bugs #6