zhu327 / gemini-openai-proxy

A proxy for converting the OpenAI API protocol to the Google Gemini Pro protocol.
MIT License
504 stars 92 forks source link

Suggestion : Adding the Gemini Pro Vision model. #7

Closed SwiftDev29 closed 6 months ago

SwiftDev29 commented 6 months ago

Maybe you can also add support in the proxy for the Vision models, so that we can use Gemini with UIs such as https://github.com/mckaywrigley/chatbot-ui

Refer here too : https://github.com/mckaywrigley/chatbot-ui/pull/1034

zhu327 commented 6 months ago

https://github.com/zhu327/gemini-openai-proxy/issues/5#issuecomment-1863833466

Take a look here, perhaps we'll need to wait for contributors to support this feature.

zhu327 commented 6 months ago
curl http://localhost:8080/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $YOUR_GOOGLE_AI_STUDIO_API_KEY" \
 -d '{
     "model": "gpt-4-vision-preview",
     "messages": [{"role": "user", "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
     ]}],
     "temperature": 0.7
 }'

Gemini Pro Vision supported, have fun😊

Manojbhat09 commented 6 months ago

How to test the gemini pro vision? I only see openai key

zhu327 commented 6 months ago
curl http://localhost:8080/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $YOUR_GOOGLE_AI_STUDIO_API_KEY" \
 -d '{
     "model": "gpt-4-vision-preview",
     "messages": [{"role": "user", "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
     ]}],
     "temperature": 0.7
 }'

Gemini Pro Vision supported, have fun😊

When you are using the gpt-4-vision-preview model to access the gemini openai proxy, what is actually being called is the Gemini Pro Vision @Manojbhat09

tamarott commented 3 months ago

is base64 image format supported? what is the syntax for that?

zhu327 commented 3 months ago

@tamarott Yes, it supports the base64 image format. For specific usage, please refer to the openai api documentation