run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.37k stars 4.67k forks source link

[Bug]: Gemini is not supported in agentic components #11439

Closed bottlehounds-dev closed 4 months ago

bottlehounds-dev commented 4 months ago

Bug Description

Hey all!

I'm sure that you're aware of this, but using a gemini LLM in a query engine, chat engine, or agent isn't possible out of the box.

Issues

There are two main issues (that I am aware of):

  1. gemini does not support messages with the SYSTEM role, but these are always used by default by lammaindex in:
    1. The TEXT_QA_SYSTEM_PROMPT for response synthesizers
    2. When using a ReActAgent, the default chat formatter prepends the history with the agent instructions as a SYSTEM message in ReActChatFormatter.format
  2. An error is thrown in vertex_utils._parse_chat_history when the chat history is not an even number of messages, this is guaranteed to happen when:
    1. Using the default QA prompts for response synthesizers, as they are composed of two messages
    2. The aforementioned ReActChatFormatter.format method will prepend the history with another message containing agent instructions

Workarounds

  1. For the response synthesizer issues, we can simply override text_qa_template to only contain a single USER, taking care of both the SYSTEM message and the odd number of messages in the chat history.
  2. For the agent, we can only solve the SYSTEM message issue (I did this by subclassing ReActChatFormatter and updating the message role from SYSTEM to USER

Blocker

Finally we have come to the issue that I do not have a workaround for, outside of reimplementing the Vertex LLM and/or how it is used with agents.

I do not know how to avoid the uneven number of chat history messages when using gemini with a ReAct agent. The agent will always need to give its instructions/context. Should I simply attempt to combine this with the user message?

Version

latest

Steps to Reproduce

Response Synthesizer

  1. Create a query engine with a response synthesizer that uses the gemini LLM and run a query on it

Agent

  1. Create a ReAct agent with the gemini LLM and run a chat on it

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 4 months ago

🤖

It's great to see that you've identified the issues and even proposed some workarounds for the gemini LLM problem. Your insights and proposed solutions are valuable. If you're able to implement these workarounds and test them, it would be fantastic if you could create a pull request with your changes. Your contributions would be greatly appreciated!

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 4 months ago

probably a decent solution is

bottlehounds-dev commented 4 months ago

@logan-markewich what you just said is essentially what I ended up doing in my subclass of ReActChatFormatter. I just feel like we can do better.

So I'm going to try a few things out and see if I can't come up with something that I can whip up a PR for! The goal would be to have more stable gemini support in llama-index's agentic components.

gich2009 commented 4 months ago

Hi @bottlehounds-dev. I've been using query engines, chat engines and react agents with gemini models, both pro and ultra and they have been working fine. There was a time they weren't working because of problems with the system message: user message mapping but I raised a similar issue and it was fixed.

What modules are not working on your end?

dmarianobenchsci commented 4 months ago

@gich2009 thanks for chiming in! This is my work GitHub account, apologies for the confusion.

I am using the Vertex LLM abstraction to access gemini-pro. This is because I am using the vertex API to authenticate. Are you using the Gemini abstraction? If so then that is something we will try!

gich2009 commented 4 months ago

Hi @bottlehounds-dev, no worries.

That explains it. I usually use the Gemini abstraction directly. I've never tried running it behind the vertex abstraction. Seems to be a Vertex-Gemini issue. I'll experiment later. It was probably an oversight when the gemini abstraction was fixed.

The gemini one works pretty well. Try it out and tell me if you encounter challenges because I'd be curious to see what they are.

shuozhu1 commented 4 months ago

Bug Description

Hey all!

I'm sure that you're aware of this, but using a gemini LLM in a query engine, chat engine, or agent isn't possible out of the box.

Issues

There are two main issues (that I am aware of):

  1. gemini does not support messages with the SYSTEM role, but these are always used by default by lammaindex in:

    1. The TEXT_QA_SYSTEM_PROMPT for response synthesizers
    2. When using a ReActAgent, the default chat formatter prepends the history with the agent instructions as a SYSTEM message in ReActChatFormatter.format
  2. An error is thrown in vertex_utils._parse_chat_history when the chat history is not an even number of messages, this is guaranteed to happen when:

    1. Using the default QA prompts for response synthesizers, as they are composed of two messages
    2. The aforementioned ReActChatFormatter.format method will prepend the history with another message containing agent instructions

Workarounds

  1. For the response synthesizer issues, we can simply override text_qa_template to only contain a single USER, taking care of both the SYSTEM message and the odd number of messages in the chat history.
  2. For the agent, we can only solve the SYSTEM message issue (I did this by subclassing ReActChatFormatter and updating the message role from SYSTEM to USER

Blocker

Finally we have come to the issue that I do not have a workaround for, outside of reimplementing the Vertex LLM and/or how it is used with agents.

I do not know how to avoid the uneven number of chat history messages when using gemini with a ReAct agent. The agent will always need to give its instructions/context. Should I simply attempt to combine this with the user message?

Version

latest

Steps to Reproduce

Response Synthesizer

  1. Create a query engine with a response synthesizer that uses the gemini LLM and run a query on it

Agent

  1. Create a ReAct agent with the gemini LLM and run a chat on it

Relevant Logs/Tracbacks

No response

Hello,I seem to have the same problem as you, but I don't have a suitable solution yet. Could I take a look at your code for this section? Much appreciated!!!!

logan-markewich commented 4 months ago

@shuozhu1 I think this is fixed tbh

pip install -U llama-index-llms-vertex should have it working

shuozhu1 commented 4 months ago

@shuozhu1 I think this is fixed tbh

pip install -U llama-index-llms-vertex should have it working

i will try. thanks for your help!!!(but my llm is llama-13b-chinese actually)

d-mariano commented 4 months ago

@shuozhu1 @logan-markewich sorry friends, I meant to close this one out! <3