langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
43.23k stars 5.99k forks source link

Gemini 1.5 Pro Audio Input Support #3560

Closed ifsheldon closed 2 months ago

ifsheldon commented 4 months ago

Self Checks

1. Is this request related to a challenge you're experiencing?

Yes, I'd like to input audio to Gemini 1.5 Pro

2. Describe the feature you'd like to see

Support audio input for Gemini 1.5 Pro

3. How will this feature improve your workflow or experience?

Now we don't need ASR models anymore, Gemini 1.5 Pro as I tested can understand English and Chinese speech

4. Additional context or comments

No response

5. Can you help us with this feature?

dosubot[bot] commented 3 months ago

Hi, @ifsheldon,

I'm helping the Dify team manage their backlog and am marking this issue as stale. It looks like you opened an issue requesting support for audio input for Gemini 1.5 Pro, which currently understands English and Chinese speech. This enhancement would eliminate the need for ASR models and improve user workflow. However, there hasn't been any further activity or comments on the issue.

Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

ottoradiologia commented 1 month ago

Hi, @dosubot,

Thank you for managing the backlog. We would like to confirm that the audio input feature for Gemini 1.5 Pro is still highly relevant and important for us. We believe this enhancement would significantly improve the user workflow by eliminating the need for ASR models.

In particular, we are also interested in integrating the audio input with Whisper3 from Groq, which is multilingual and supports various languages. This feature would greatly enhance the versatility and usability of the system.

Could you please consider prioritizing this feature? We believe it would bring great value to the community and streamline the user experience.

Thank you for your attention and support.

Best regards, Otto