wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 254 forks source link

Requests empty #563

Closed robertoronderosjr closed 2 years ago

robertoronderosjr commented 2 years ago

Hi, I'm working for a big firm and we're trying to use seleniumwire to modify the headers to be able to make requests to one of our web portals.

I've configured selenium wire like so:

        from seleniumwire import webdriver
        from selenium.webdriver.chrome.options import Options

        def interceptor(request):
               request.headers['TestHeader'] = "Testing"

        chrome_options = Options()
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--windows-size=1920,1080")
        chrome_options.add_argument("--disable-dev--shm-usage")
        driver = webdriver.Chrome(executable_path=service_config["chromeDriverPath"], options=chrome_options) 
        driver.request_interceptor = interceptor
        driver.get("https://httpbin.org/headers")
        print(f'driver.requests = {driver.requests}')
        driver.quit()

The above produces the following output:

` [2022-06-15 10:52:18] INFO:seleniumwire.storage:Using default request storage

[2022-06-15 10:52:18] INFO:seleniumwire.backend:Created proxy listening on 127.0.0.1:42045

[2022-06-15 10:52:19] DEBUG:selenium.webdriver.remote.remote_connection:POST http://localhost:46084/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "pageLoadStrategy": "normal", "proxy": {"proxyType": "manual", "httpProxy": "127.0.0.1:42045", "sslProxy": "127.0.0.1:42045"}, "acceptInsecureCerts": true, "goog:chromeOptions": {"extensions": [], "binary": "/external/com/googlesource/chromium/headless-chromium/chromium_91.0.4472.106-91.0.4472.106/chromium_91.0.4472.106/chrome/headless_shell", "args": ["--no-sandbox", "--headless", "--windows-size=1920,1080", "--disable-dev--shm-usage", "--proxy-bypass-list=<-loopback>"]}}}, "desiredCapabilities": {"browserName": "chrome", "pageLoadStrategy": "normal", "proxy": {"proxyType": "manual", "httpProxy": "127.0.0.1:42045", "sslProxy": "127.0.0.1:42045"}, "acceptInsecureCerts": true, "goog:chromeOptions": {"extensions": [], "binary": "/external/com/googlesource/chromium/headless-chromium/chromium_91.0.4472.106-91.0.4472.106/chromium_91.0.4472.106/chrome/headless_shell", "args": ["--no-sandbox", "--headless", "--windows-size=1920,1080", "--disable-dev--shm-usage", "--proxy-bypass-list=<-loopback>"]}}}

[2022-06-15 10:52:19] DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:46084

[2022-06-15 10:52:21] DEBUG:urllib3.connectionpool:http://localhost:46084 "POST /session HTTP/1.1" 200 796

[2022-06-15 10:52:21] DEBUG:selenium.webdriver.remote.remote_connection:Finished Request

[2022-06-15 10:52:26] DEBUG:selenium.webdriver.remote.remote_connection:POST http://localhost:46084/session/62143eb49f660be006e7008e591ab89a/url {"url": "https://httpbin.org/headers"} [2022-06-15 10:52:30] DEBUG:urllib3.connectionpool:http://localhost:46084 "POST /session/62143eb49f660be006e7008e591ab89a/url HTTP/1.1" 200 14

[2022-06-15 10:52:30] DEBUG:selenium.webdriver.remote.remote_connection:Finished Request

driver.requests = []

[2022-06-15 10:52:33] DEBUG:seleniumwire.storage:Cleaning up /tmp/.seleniumwire/storage-724354e5-d7bd-4ac0-a3d1-c2e55ad320d7

[2022-06-15 10:52:33] DEBUG:selenium.webdriver.remote.remote_connection:DELETE http://localhost:46084/session/62143eb49f660be006e7008e591ab89a {}

[2022-06-15 10:52:33] DEBUG:urllib3.connectionpool:http://localhost:46084 "DELETE /session/62143eb49f660be006e7008e591ab89a HTTP/1.1" 200 14

[2022-06-15 10:52:33] DEBUG:selenium.webdriver.remote.remote_connection:Finished Request ` Library versions: selenium-wire = 4.6.4 selenium = 4.1.0 chromedriver = linux64-91.0.4472.101 chromium = 91.0.4472.106

Not sure exactly why it's not intercepting any requests. Any help here is much appreciated!

wkeeling commented 2 years ago

The code you've posted looks ok. It's possible that Selenium Wire hasn't been able to automatically configure the browser to redirect traffic through it's internal proxy. Selenium Wire uses this internal proxy to capture requests. This can sometimes happen if a web browser is locked down to use a company's own proxy server - e.g. if a company needs to control outbound internet access. Are you able to check whether your browser is configured in this way? Have a look at Chromium's existing proxy server settings and see whether these are modifiable.

robertoronderosjr commented 2 years ago

The code you've posted looks ok. It's possible that Selenium Wire hasn't been able to automatically configure the browser to redirect traffic through it's internal proxy. Selenium Wire uses this internal proxy to capture requests. This can sometimes happen if a web browser is locked down to use a company's own proxy server - e.g. if a company needs to control outbound internet access. Are you able to check whether your browser is configured in this way? Have a look at Chromium's existing proxy server settings and see whether these are modifiable.

do you know how to check these settings? -- This is most likely the issue. The firm must be configuring a proxy in the executable for headless_chrome that is available.

wkeeling commented 2 years ago

Open Chrome's settings and search for "proxy". On my machine (Linux) the proxy option is listed as "Open your computer's proxy settings" - see below:

image

Click on that link to open the network/proxy settings:

image

robertoronderosjr commented 2 years ago

Open Chrome's settings and search for "proxy". On my machine (Linux) the proxy option is listed as "Open your computer's proxy settings" - see below:

image

Click on that link to open the network/proxy settings:

image

If I do that from my Windows machine it does show that it's using some setup script and has a setup script address configured. However, I SSHed into a red hat 7 machine and I'm running python from it. I'm not able to open the browser since it's an SSH session and the executable is for a headless browser only

wkeeling commented 2 years ago

OK I think it's probable that it's a security restriction imposed by the company - since your local Windows machine has been configured that way and the RedHat Linux box is probably also configured the same way.

One thing you could try is starting Chromium on RedHat using a different profile. The proxy settings may have been set up in the browser's default profile, so starting with a clean profile might mean you don't inherit them. This post goes into some more details on specifying a Chrome profile.

robertoronderosjr commented 2 years ago

Tried that. Added chrome_options.add_argument(f"--user-data-dir=/home/ronder/scratch/services/new-profile") chrome_options.add_argument(f"--profile-directory=Profile 2") But no luck. Can you think of a way to configure chrome (using chrome_options) in conjunction with seleniumwire options that could help them talk?

wkeeling commented 2 years ago

You could try specifying the proxy server manually in the ChromeOptions and tell Selenium Wire to not auto configure - e.g.

options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=http://127.0.0.1:12345')

sw_options = {
    'port': 12345,
    'auto_config': False,
}

driver = webdriver.Chrome(
    options=options,
    seleniumwire_options=sw_options,
)

That may give the same result as before if Chrome is locked down to using the corporate proxy but worth a try.

robertoronderosjr commented 2 years ago

Ended up finding a binary that worked for me. Thanks for the help!

hodgesag commented 2 years ago

@robertoronderosjr Hi there! I am experiencing the same issue you were -- getting blank requests back from selenium wire under company firewall. I was wondering what your binary solution that worked for you was? I'm pulling hairs out trying to figure it out

AntonioVentilii commented 2 years ago

@wkeeling @robertoronderosjr Hi there! Same issue here too, like @hodgesag

Any way to find a solution?