KartikeyBartwal commented 2 months ago

I am creating 2 different instances of Gemma2B model, one for llm and the other for mm_llm. The world model is working fine but the action model gives me the message I mentioned at the title. Here is the implementation

Customize the LLM, multi-modal LLM and embedding models

llm = AutoModelForCausalLM.from_pretrained(model_id,
                                             quantization_config=bnb_config,
                                             device_map={"":0},
                                             token=os.environ['HF_TOKEN'])

mm_llm =  AutoModelForCausalLM.from_pretrained(model_id,
                                             quantization_config=bnb_config,
                                             device_map={"":0},
                                             token=os.environ['HF_TOKEN'])

Initialize the Selenium driver

selenium_driver = SeleniumDriver()

Initialize a WorldModel and ActionEngine passing them the custom context

world_model = WorldModel(mm_llm=mm_llm)
action_engine = ActionEngine.from_context(driver=selenium_driver, llm=llm)

Create your agent

agent = WebAgent(world_model, action_engine)

agent.get("https://huggingface.co/docs")
agent.run("Go on the quicktour of PEFT")

KartikeyBartwal commented 2 months ago

I just realized, gemma isn't a multimodel llm.

KartikeyBartwal commented 2 months ago

issue persists even with:

mm_llm = genai.GenerativeModel("gemini-1.5-pro")

paulpalmieri commented 2 months ago

Hello @KartikeyBartwal, can you try passing a Context object initialized with llm, mm_llm and embeddings to the from_context ?

from llama_index.llms.gemini import Gemini
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.embeddings.openai import OpenAIEmbedding
from lavague.core.context import Context

llm_name = "gemini-1.5-flash-001"
mm_llm_name = "gpt-4o-mini"
embedding_name = "text-embedding-3-small"

# init models
llm = Gemini(
    model="models/" + llm_name
)  # gemini models are prefixed with "models/" in LlamaIndex
mm_llm = OpenAIMultiModal(model=mm_llm_name)
embedding = OpenAIEmbedding(model=embedding_name)

# init context
context = Context(llm, mm_llm, embedding)

# use the context
action_engine = ActionEngine.from_context(context=context, driver=selenium_driver)

You can see how to create a custom context at the end of this page: https://docs.lavague.ai/en/latest/docs/get-started/customization/

@lyie28 There might be a mistake in the docs: https://docs.lavague.ai/en/latest/docs/get-started/customization/

In the Customization on-the-fly section example, you pass an llm object to the ActionEngine via the from_context() method. But this method doesn't seem to accept a LLM https://github.com/lavague-ai/LaVague/blob/2d7d0696411ddad56f68aba458e3ca155bf88e2a/lavague-core/lavague/core/action_engine.py#L106

def from_context(
        cls,
        context: Context,
        driver: BaseDriver,
        navigation_engine: BaseEngine = None,
        python_engine: BaseEngine = None,
        navigation_control: BaseEngine = None,
        retriever: BaseHtmlRetriever = None,
        prompt_template: PromptTemplate = NAVIGATION_ENGINE_PROMPT_TEMPLATE.prompt_template,
        extractor: BaseExtractor = DynamicExtractor(),
        time_between_actions: float = 1.5,
        n_attempts: int = 5,
        logger: AgentLogger = None,
    ) -> ActionEngine:

lyie28 commented 1 month ago

Will fix this now

lyie28 commented 1 month ago

Hi @KartikeyBartwal - thanks for flagging this!

When passing just the mm_llm or llm rather than passing them via a context - we should not use the from_context() method. I am updating the docs now as there was an error there - sorry about that 🤦

world_model = WorldModel(mm_llm=mm_llm)
action_engine = ActionEngine(driver=selenium_driver, llm=llm)

Let me know if this resolves the issue?

lavague-ai / LaVague

TypeError: ActionEngine.from_context() got an unexpected keyword argument 'llm' #525

Customize the LLM, multi-modal LLM and embedding models

Initialize the Selenium driver

Initialize a WorldModel and ActionEngine passing them the custom context

Create your agent