small-cactus / M.I.L.E.S

M.I.L.E.S, a GPT-4-Turbo voice assistant, self-adapts its prompts and AI model, can play any Spotify song, adjusts system and Spotify volume, performs calculations, browses the web and internet, searches global weather, delivers date and time, autonomously chooses and retains long-term memories. Available for macOS and Windows.
https://github.com/small-cactus/M.I.L.E.S
MIT License
158 stars 23 forks source link

Gemini pro 1.5 support? #10

Closed letmefocus closed 5 months ago

letmefocus commented 5 months ago

Is your feature request related to a problem? Please describe.

No

Describe the solution you'd like A clear and concise description of what you want to happen. Since gemini 1.5 (vertex ai) now has access to function calling, i was wondering if it was possible for it to be implemented on this system.

Additional context Gemini 1.5 has access to OpenAPI Spec function calling like OpenAI, and gemini only requires 1 model for vision and language, rather than openai needing two. You can have a far bigger chat history with context, upload files like pdfs and have access to a better system prompt. https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/function-calling https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai

to use this you need the gemini-1.5-pro-preview-0409. if it is not possible using the 1.5 model, you can use the gemini-1.0-pro-001 or gemini-1.0-pro-002 models respectively.

if needing help with this, e.g: accessing the api, i'd be more than willing to help.

@small-cactus

letmefocus commented 5 months ago

By the way, function calling supports both prompt turn and chat turn models.

small-cactus commented 5 months ago

From the documentation you provided, the API schema and overall how it works compared to the OpenAI API is very different. One example is that the vertex api wants you to list the tool list json inside the function itself, id have to rewrite the tool logic to implement Gemini support.

If you could prove me wrong and show a more compatible method or a way to bridge both apis into one easily, that would be great.

On Mon, Apr 15, 2024 at 6:10 AM Josh @.***> wrote:

By the way, function calling supports both prompt turn and chat turn models.

— Reply to this email directly, view it on GitHub https://github.com/small-cactus/M.I.L.E.S/issues/10#issuecomment-2056453241, or unsubscribe https://github.com/notifications/unsubscribe-auth/A57SAQN5I75COBVO2JLMQKLY5ORQ5AVCNFSM6AAAAABGHAMSWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJWGQ2TGMRUGE . You are receiving this because you were mentioned.Message ID: @.***>

letmefocus commented 5 months ago

If it's not possible I'd be willing to open a different fork and help out with the code for it. I'll try provide a quick sample of the function definitions.

letmefocus commented 5 months ago

From what it looks like using both the Curl and Python implementations, you can implement the functions as an array. If not possible using the existing package in python, you can always write a separate package for Gemini based on how it works with the Curl implementation.

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/function-calling#curl_1 https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/function-calling#python_1

From the Curl method it's similar between the python function for the OpenAI API and the Gemini API:

OpenAI:

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [{
        "text": "What is the weather in Boston?"
      }]
    }],
    "tools": [{
      "function_declarations": [
        {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              }
            },
            "required": [
              "location"
            ]
          }
        }
      ]
    }]
  }'

Reference for the OpenAI Function Calling example: https://platform.openai.com/docs/guides/function-calling

letmefocus commented 5 months ago

You could also very possibly use Litellm: https://docs.litellm.ai/docs/providers/gemini

small-cactus commented 5 months ago

Adding that support would be out of the scope of my skills, I've played around with gemini as a voice assistant and it's not viable, so I don't see value in adding it, there are many reasons, here are some I can name off the top of my head:

All gemini models tend to over explain and don't know when to "shut up". Eg they over explain everything and add extra sentences when not necessary. It'll give you the answer in the first sentence, and then keep talking and talking and talking for like 4 more paragraphs.

Gemini models do not follow instructions very well, they act non deterministic.

Gemini models are trained on VASTLY different training data sets from ChatGPT models, so my entire system prompt would be useless for ideal performance.

Response times are very slow, sometimes it can be fast, but again, they over explain and run time up, a 1 second delay is the difference between a slow reply and thinking that it stopped working.

Believe it or not, OpenAI models are trained to be spoken to, the response format fits perfectly with spoken language, Gemini models are fine in this regard, but it's really not the best. They also don't pay attention to the system instructions enough to understand that I put "You are talking to the user in a voice conversation" into the prompt.

But you are extremely welcome to fork and try to add gemini support, any details you need, I can help with and I will provide them. Here's some stuff to get you started, all tools are located in tools.json, the actual functions to correspond to these arrays are located in main.py randomly within the code (bad I know). The only part you should have to modify is the change AI model function, and "maybe" the change personality function, because it changes the system prompt and I don't know if the format is the same.

Other than that, all openai code (besides webcam recognition) is within the ask function. Let me know if you need anything else!

letmefocus commented 5 months ago

From what I've seen, if you use LiteLLM, it supports using the Gemini model through an OpenAI API structure. It also supports using the OpenAI python package (by changing the model and base URL), and function calling. I haven't tested the function calling using LiteLLM yet, but it looks promising.

letmefocus commented 5 months ago

https://github.com/BerriAI/litellm/blob/7ffd3d40fa0338f2cb1e7bae9e5b608dde7862ee/model_prices_and_context_window.json#L979

https://docs.litellm.ai/docs/providers/vertex

References are above.

Code sample:

import openai
client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="team1-gemini-pro",
    messages = [
        {
            "role": "user",
            "content": "what llm are you"
        }
    ],
)

print(response)
letmefocus commented 5 months ago

image sample image here

letmefocus commented 5 months ago

Also, if we do go through the plan of integrating gemini, and a "standalone app" or "bundle", I'd be willing to lend out apikeys for Gemini with the LiteLLM proxy, as it supports seeing how many credits a user can use.

letmefocus commented 5 months ago

image second example of Gemini working on an OpenAI API proxy using LiteLLM

small-cactus commented 5 months ago

I think im gonna use groq's API as they have native function calling support using OpenAI's schema for other models