hihumanzone / Gemini-Discord-Bot

A Discord bot leveraging Google Gemini. Has image recognition, conversation engagement, and content understanding.
https://gemini-discord-bot.vercel.app
MIT License
37 stars 14 forks source link

[Bug]The bot cannot recognize the image and other bugs #6

Closed AlanBacker closed 3 months ago

AlanBacker commented 3 months ago

Hi, it looks like the bot generate images. image

image
AlanBacker commented 3 months ago

And I've tested this project on two Linux servers, I got the same results.

hihumanzone commented 3 months ago

Are there any console errors while trying to recognize images? Here is the list of supported MIME types:

Regarding the second problem, it is being caused by the provider, which is the SD-XL Hugging Face space being overloaded. I have a retry system with + three attempts, because it generally eventually works; it just takes time. But recently, it's not doing that. So, to fix this, you can try changing your model and default model in the config.json to something else.

I recommend DallE-XL, but you can choose any model that you prefer. you can experiment with other models to see what works best for you. I will try my best to find a better provider. Screenshot_2024-07-03-11-14-59-78_e4424258c8b8649f6e67d283a50a2cbc Screenshot_2024-07-03-11-14-30-03_e4424258c8b8649f6e67d283a50a2cbc Screenshot_2024-07-03-11-13-07-75_e4424258c8b8649f6e67d283a50a2cbc

hihumanzone commented 3 months ago

Is this the image recognition issue you were talking about? If so, then it's the Gemini 1.5 flash model acting up. It occasionally says that, but just responding with 'you can read images' in the next message generally works. Screenshot_2024-07-03-11-25-51-68_e4424258c8b8649f6e67d283a50a2cbc

I have an idea to fix this, but I'm not sure how well it will work.

hihumanzone commented 3 months ago

I've made some changes to the code. You can ignore everything above and just expect it to work after updating. Please confirm if it works.

AlanBacker commented 3 months ago

thanks for the changes. after I tried to send an image to the bot it still needed 2-5 mins to respond, kinda weird.

hihumanzone commented 3 months ago

The bot only downloads and uploads the image to Google File API. There might be two reasons for the bot taking time:

  1. The host location has slow internet, or the uploaded image is way too big. If this is the case, the bot does not display the Let me think... message until the image is fully uploaded and ready for Google to process; it just shows the typing indicator.
  2. Google simply doesn't want to process the image, and it's not an issue with the host. If this is the case, the bot will be stuck on the Let me think... message.
hihumanzone commented 3 months ago

And there is also a 3rd case:

AlanBacker commented 3 months ago

i see, thanks for the help.