wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.86k stars 240 forks source link

selenium-wire with proxy timing-out inside AWS Lambda #716

Open FelipeLagare opened 9 months ago

FelipeLagare commented 9 months ago

Apparently, newer versions of Selenium-Wire won't work inside Lambda, so I'm using the following config to deploy my Lambda function:

Versions [headless-chromium] = 1.0.0-57 [chromedriver] = 86.0.4240.22 urllib3==1.26.6 selenium==3.141.0 pyopenssl==22.0.0 cryptography==38.0.4 selenium-wire==4.0.4

Lambda settings Runtime = Python 3.7 Memory = 10240 MB Timeout = 300 seconds

My lambda handler has the code below. After running some tests in Lambda, it's obvious that webdrive.Chrome(), driver.get() and driver.quit() are the methods lagging it the most. But without proxy that's not a problem. On local machine the proxy works fine.

Is there any way to work around this? Do you have similar cases where newer versions of Selenium-Wire or Python work inside Lambda?

from seleniumwire import webdriver
from selenium.webdriver.common.by import By
import time, json

def lambda_handler(event, context):
    page_url = event['url']

    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument("--disable-gpu")
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-extensions')
    options.add_argument("--window-size=1024x768")
    options.add_argument("--disable-application-cache")
    options.add_argument("--user-data-dir=/tmp/user-data")
    options.add_argument('--disable-software-rasterizer')
    options.add_argument("--no-cache")
    options.add_argument("--disable-infobars")
    options.add_argument("--no-sandbox")
    options.add_argument("--hide-scrollbars")
    options.add_argument("--enable-logging")
    options.add_argument("--log-level=0")
    options.add_argument("--v=99")
    options.add_argument("--single-process")
    options.add_argument("--data-path=/tmp/data-path")
    options.add_argument("--ignore-certificate-errors")
    options.add_argument("--homedir=/tmp")
    options.add_argument("--remote-debugging-port=9222")
    options.add_argument("--disk-cache-dir=/tmp/cache-dir")
    options.add_argument("'user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")
    options.binary_location = "./bin/headless-chromium"

    proxy_options = {
        'request_storage_base_dir': '/tmp',
         'exclude_hosts': '',
         "proxy": {
            "http": "**proxy_url**",
            "https": "**proxy_url**",
        }
    }
    driver = webdriver.Chrome(executable_path='./bin/chromedriver', options=options, seleniumwire_options=proxy_options)

    driver.get(page_url)
    time.sleep(2)

    graph = driver.find_element(By.CSS_SELECTOR, 'graph-element')
    elements = graph.find_elements(By.XPATH, ".//*")
    tooltip = elements[2]

    driver.execute_script(
        "arguments[0].scrollIntoView({'block':'center','inline':'center'})", 
        graph
    )

    text = []
    for offset in range(40, 730):
        action = webdriver.ActionChains(driver)
        action.move_to_element_with_offset(graph, offset, 200)
        action.perform()
        text.append(tooltip.text)

    driver.quit()

    response = {
        "statusCode": 200,
        "body": json.dumps(text)
    }
    return response