lavague-ai / LaVague

Large Action Model framework to develop AI Web Agents
https://docs.lavague.ai/en/latest/
Apache License 2.0
5.49k stars 505 forks source link

Create Browserbase integration #538

Closed adeprez closed 3 months ago

adeprez commented 3 months ago

We'd like to have an easy integration with browserbase. A code example is provided here: https://docs.lavague.ai/en/latest/docs/examples/medical_appointment_booking/

paulpalmieri commented 3 months ago

A few questions:

adeprez commented 3 months ago

It can be done with playwright too: https://docs.browserbase.com/quickstart/playwright

We can accept a new remote_connection argument of type selenium.webdriver.remote.remote_connection.RemoteConnection, then provide and document an implementation for BrowserbaseRemoteConnection.

BrowserbaseRemoteConnection would accept api_key as parameter or default to env BROWSERBASE_API_KEY

paulpalmieri commented 3 months ago

@adeprez What's the purpose of the following code in the driver init:

self.driver.execute_cdp_cmd(
    "Page.addScriptToEvaluateOnNewDocument",
    {"source": JS_SETUP_GET_EVENTS},
)

Seems like the webdriver.Remote doesn't support this and browserbase can't start if it's run on the driver.

paulpalmieri commented 3 months ago

@adeprez added a basic implementation for SeleniumDriver on 538-browserbase-integration

As I understand we should reuse the following Browserbase code for Playwright

def create_session():
    url = 'https://www.browserbase.com/v1/sessions'
    headers = {'Content-Type': 'application/json', 'x-bb-api-key': os.environ["BROWSERBASE_API_KEY"]}
    response = requests.post(url, json={ "projectId": os.environ["BROWSERBASE_PROJECT_ID"] }, headers=headers)
    # print(response.json())
    return response.json()['id']

class CustomRemoteConnection(RemoteConnection):
    _session_id = None

    def __init__(self, remote_server_addr: str, session_id: str):
        super().__init__(remote_server_addr)
        self._session_id = session_id

    def get_remote_connection_headers(self, parsed_url, keep_alive=False):
        headers = super().get_remote_connection_headers(parsed_url, keep_alive)
        headers.update({'x-bb-api-key': os.environ["BROWSERBASE_API_KEY"]})
        headers.update({'session-id': self._session_id})
        return headers

Where would you put it so that it's usable by both drivers ?

Also not sure what we want to accept from the user if not a flag. Do we want to create a wrapper BrowserbaseRemoteConnection that accepts keys ? And if we pass it to a driver, it uses the common logic above to instantiate a driver ?

adeprez commented 3 months ago

@adeprez What's the purpose of the following code in the driver init:

self.driver.execute_cdp_cmd(
    "Page.addScriptToEvaluateOnNewDocument",
    {"source": JS_SETUP_GET_EVENTS},
)

Seems like the webdriver.Remote doesn't support this and browserbase can't start if it's run on the driver.

It is some JavaScript code injected before page loads, we use it to track events listeners added to elements. It's needed to determine which elements are interactive

@adeprez added a basic implementation for SeleniumDriver on 538-browserbase-integration

As I understand we should reuse the following Browserbase code for Playwright

def create_session():
    url = 'https://www.browserbase.com/v1/sessions'
    headers = {'Content-Type': 'application/json', 'x-bb-api-key': os.environ["BROWSERBASE_API_KEY"]}
    response = requests.post(url, json={ "projectId": os.environ["BROWSERBASE_PROJECT_ID"] }, headers=headers)
    # print(response.json())
    return response.json()['id']

class CustomRemoteConnection(RemoteConnection):
    _session_id = None

    def __init__(self, remote_server_addr: str, session_id: str):
        super().__init__(remote_server_addr)
        self._session_id = session_id

    def get_remote_connection_headers(self, parsed_url, keep_alive=False):
        headers = super().get_remote_connection_headers(parsed_url, keep_alive)
        headers.update({'x-bb-api-key': os.environ["BROWSERBASE_API_KEY"]})
        headers.update({'session-id': self._session_id})
        return headers

Where would you put it so that it's usable by both drivers ?

Also not sure what we want to accept from the user if not a flag. Do we want to create a wrapper BrowserbaseRemoteConnection that accepts keys ? And if we pass it to a driver, it uses the common logic above to instantiate a driver ?

I think we could accept any RemoteConnection that matches the requirements from selenium. And our BrowserbaseRemoteConnection implementation could fit for both Playwright and Selenium use cases.

The class itself can be in core, either in existing base_driver.py or in a new file.

paulpalmieri commented 3 months ago
connection = BrowserBaseCustomConnection(addr, id, KEY: opt)
driver = SeleniumDriver(remote_connection=connection)

For playwright: Browserbase gives a browser with an already setup context, make sure to rearange the logic in base.py

driver = PlaywrightDriver(remote_url)