handrew / browserpilot

Natural language browser automation
MIT License
497 stars 60 forks source link

how to use GPT-4 models? #6

Closed agn-7 closed 8 months ago

agn-7 commented 8 months ago

I tried using gpt-4 models within the arguments, however, it doesn't work and nothing happens on the browser anymore!

poetry run python examples.py selenium prompts/examples/buffalo_wikipedia.yaml --chromedriver_path /home/benyamin/Downloads/chromedriver-linux64/chromedriver --model gpt-4-1106-preview --debug

INFO:browserpilot.agents.gpt_selenium_agent:Using model for instructions: gpt-4-1106-preview
INFO:browserpilot.agents.gpt_selenium_agent:Using model for responses: gpt-4-1106-preview
INFO:browserpilot.agents.compilers.instruction_compiler:Using model gpt-4-1106-preview.
INFO:browserpilot.agents.gpt_selenium_agent:No cached instructions found. Running...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:browserpilot.agents.gpt_selenium_agent:
Instruction: Go to Google.com
Find all textareas.
Find the first visible textarea.
Click on the first visible textarea.
Type in "buffalo buffalo buffalo buffalo buffalo" and press enter.
Wait 2 seconds.
Find all anchor elements that link to Wikipedia.
Click on the first one.

Action: 

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:browserpilot.agents.gpt_selenium_agent:
Instruction: Wait for 10 seconds.

Action: 

I also tried with the gpt-4-turbo-preview as well, but I got the same behaviour.

Moreover, I added the model_for_responses=model, to the parameter as well, but still have the same issue:

@cli.command()
@click.argument("instructions")
@click.option("--chromedriver_path", default="./chromedriver", help="chromedriver path")
@click.option("--model", default="gpt-3.5-turbo", help="which model?")
@click.option("--memory_folder", default=None, help="Memory folder.")
@click.option("--debug", is_flag=True, help="Enable debugging.")
@click.option("--output", default=None, help="Instruction output file.")
def selenium(instructions, chromedriver_path, model, memory_folder, debug, output):
    with open(instructions, "r") as instructions:
        agent = GPTSeleniumAgent(
            instructions,
            chromedriver_path,
            instruction_output_file=output,
            model_for_instructions=model,
            model_for_responses=model,  # here
            memory_folder=memory_folder,
            debug=debug,
            retry=True,
        )
        agent.run()
handrew commented 8 months ago

Hey, I just tried it and got the same issue. BrowserPilot was built with GPT-3 and GPT-3.5, and I haven't done extensive testing on it with GPT-4. Model outputs tend to be really unstable when they are "upgraded", so prompts that work with older models don't necessarily work with fine-tuned, newer models.

That being said, it worked with gpt-4 for me. Maybe try that?

agn-7 commented 8 months ago

@handrew yes, you're right. I just tested it. It works with gpt-4 properly, but doesn't work with newer ones such as gpt-4-turbo.

handrew commented 8 months ago

Great! Thanks for using BrowserPilot.

agn-7 commented 8 months ago

I detected what's the reason for not working the app with gpt-4-turbo-preview or gpt-4-1106-preview.

Look at the following code snippets' result differences via gpt-4-turbo-preview and gpt-4:

In [2]: from openai import OpenAI

In [3]: client = OpenAI(
   ...:     # This is the default and can be omitted
   ...:     api_key=api,
   ...: )

In [4]: response = client.chat.completions.create(model="gpt-4-turbo-preview", messages=[{"role": "user", "content": "write a simpl
   ...: e code python to say Hello world!. Note that, just provide code not any extra text or explanation."}])

In [5]: response.choices[0].message.content
Out[5]: '```python\nprint("Hello world!")\n```'

In [6]: print(response.choices[0].message.content)
'```python
print("Hello world!")
'```

In [7]: response = client.chat.completions.create(model="gpt-4", messages=[{"role": "user", "content": "write a simple code python 
   ...: to say Hello world!. Note that, just provide code not any extra text or explanation."}])

In [8]: print(response.choices[0].message.content)
print("Hello world!")

As you can see, the gpt-4-turbo-preview generates "```python\n \n```" as for its output. That's why the following code leads to fetching empty text as the result due to the stop="```":

            if "gpt-3.5-turbo" in model or "gpt-4" in model:
                response = client.chat.completions.create(
                    model=model,
                    messages=[{"role": "user", "content": prompt}],
                    max_tokens=max_tokens,
                    top_p=1,
                    frequency_penalty=0,
                    presence_penalty=0,
                    temperature=temperature,
                    stop=stop,
                )
                text = response.choices[0].message.content 

I'll fix the issue as a new PR asap.

handrew commented 8 months ago

Interesting! Thanks for looking into it. I wonder if it makes sense to adjust the prompt to be more like a chat message than a prompt completion, or if we should just check the model response .startswith("```python"). What do you think?