Closed KartikeyBartwal closed 1 month ago
I just realized, gemma isn't a multimodel llm.
issue persists even with:
mm_llm = genai.GenerativeModel("gemini-1.5-pro")
Hello @KartikeyBartwal, can you try passing a Context
object initialized with llm
, mm_llm
and embeddings
to the from_context
?
from llama_index.llms.gemini import Gemini
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.embeddings.openai import OpenAIEmbedding
from lavague.core.context import Context
llm_name = "gemini-1.5-flash-001"
mm_llm_name = "gpt-4o-mini"
embedding_name = "text-embedding-3-small"
# init models
llm = Gemini(
model="models/" + llm_name
) # gemini models are prefixed with "models/" in LlamaIndex
mm_llm = OpenAIMultiModal(model=mm_llm_name)
embedding = OpenAIEmbedding(model=embedding_name)
# init context
context = Context(llm, mm_llm, embedding)
# use the context
action_engine = ActionEngine.from_context(context=context, driver=selenium_driver)
You can see how to create a custom context at the end of this page: https://docs.lavague.ai/en/latest/docs/get-started/customization/
@lyie28 There might be a mistake in the docs: https://docs.lavague.ai/en/latest/docs/get-started/customization/
In the Customization on-the-fly
section example, you pass an llm object to the ActionEngine
via the from_context()
method. But this method doesn't seem to accept a LLM https://github.com/lavague-ai/LaVague/blob/2d7d0696411ddad56f68aba458e3ca155bf88e2a/lavague-core/lavague/core/action_engine.py#L106
def from_context(
cls,
context: Context,
driver: BaseDriver,
navigation_engine: BaseEngine = None,
python_engine: BaseEngine = None,
navigation_control: BaseEngine = None,
retriever: BaseHtmlRetriever = None,
prompt_template: PromptTemplate = NAVIGATION_ENGINE_PROMPT_TEMPLATE.prompt_template,
extractor: BaseExtractor = DynamicExtractor(),
time_between_actions: float = 1.5,
n_attempts: int = 5,
logger: AgentLogger = None,
) -> ActionEngine:
Will fix this now
Hi @KartikeyBartwal - thanks for flagging this!
When passing just the mm_llm
or llm
rather than passing them via a context - we should not use the from_context()
method. I am updating the docs now as there was an error there - sorry about that 🤦
world_model = WorldModel(mm_llm=mm_llm)
action_engine = ActionEngine(driver=selenium_driver, llm=llm)
Let me know if this resolves the issue?
I am creating 2 different instances of Gemma2B model, one for llm and the other for mm_llm. The world model is working fine but the action model gives me the message I mentioned at the title. Here is the implementation
Customize the LLM, multi-modal LLM and embedding models
Initialize the Selenium driver
selenium_driver = SeleniumDriver()
Initialize a WorldModel and ActionEngine passing them the custom context
Create your agent
agent = WebAgent(world_model, action_engine)