Suggestion : Adding the Gemini Pro Vision model.

zhu327 / gemini-openai-proxy

A proxy for converting the OpenAI API protocol to the Google Gemini Pro protocol.

MIT License

578 stars 109 forks source link

Suggestion : Adding the Gemini Pro Vision model. #7

Closed swiftdev29 closed 11 months ago

swiftdev29 commented 11 months ago

Maybe you can also add support in the proxy for the Vision models, so that we can use Gemini with UIs such as https://github.com/mckaywrigley/chatbot-ui

Refer here too : https://github.com/mckaywrigley/chatbot-ui/pull/1034

zhu327 commented 11 months ago

https://github.com/zhu327/gemini-openai-proxy/issues/5#issuecomment-1863833466

Take a look here, perhaps we'll need to wait for contributors to support this feature.

zhu327 commented 11 months ago

curl http://localhost:8080/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $YOUR_GOOGLE_AI_STUDIO_API_KEY" \
 -d '{
     "model": "gpt-4-vision-preview",
     "messages": [{"role": "user", "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
     ]}],
     "temperature": 0.7
 }'

Gemini Pro Vision supported, have fun😊

Manojbhat09 commented 10 months ago

How to test the gemini pro vision? I only see openai key

zhu327 commented 10 months ago

curl http://localhost:8080/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $YOUR_GOOGLE_AI_STUDIO_API_KEY" \
 -d '{
     "model": "gpt-4-vision-preview",
     "messages": [{"role": "user", "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
     ]}],
     "temperature": 0.7
 }'

Gemini Pro Vision supported, have fun😊

When you are using the gpt-4-vision-preview model to access the gemini openai proxy, what is actually being called is the Gemini Pro Vision @Manojbhat09

tamarott commented 8 months ago

is base64 image format supported? what is the syntax for that?

zhu327 commented 8 months ago

@tamarott Yes, it supports the base64 image format. For specific usage, please refer to the openai api documentation