wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 254 forks source link

Request/Response cycle with an HTTP proxy is ~3x slower than vanilla chromedriver #74

Closed itay747 closed 3 years ago

itay747 commented 5 years ago

Using a the test site https://icanhazip.com with 100 requests behind a high speed HTTP proxy server, seleniumwire peaks at around 2.5 iterations per second with tqdm

image

The same configuration with vanilla chromedriver:

image

Any way of speeding this up? I tried adding the following flags to seleniumwire options to no avail:

'disable_encoding': True, 'verify_ssl': False,

The culprit seems to be the http cycle turnaround time. Vanilla chromedriver + proxy:

image

Seleniumwire + proxy:

image

wkeeling commented 4 years ago

Thanks for running these tests and highlighting the performance difference.

Yes HTTP I/O is likely to be the cause, mainly because Selenium Wire works by sending requests through it's own embedded proxy server in order to capture data. The overhead of the proxy opening and closing connections slows things down.

We may be able to optimize the connection handling better, although we'd need to do some analysis first.

Out of curiosity, have you tried running the same test against a non-ssl (http only) site?

itay747 commented 4 years ago

Out of curiosity, have you tried running the same test against a non-ssl (http only) site?

I haven't run it on an HTTP site.

wkeeling commented 4 years ago

There's been a couple of relatively recent changes to Selenium Wire that may improve performance.

The first is that Selenium Wire now has a configurable connection keep-alive. This is off by default, but it can be turned on with:

options = {
    'connection_keep_alive': True  # Allow persistent connections
}
driver = webdriver.Firefox(seleniumwire_options=options)  # Also works with Chrome

With keep-alive on, Selenium Wire will attempt to reuse socket connections which will reduce I/O overhead.

The second change is the introduction of mitmproxy as a backend. mitmproxy has shown to improve page load times, particularly when using an upstream proxy. If you're still experiencing slow performance after switching on keep-alive, switching the backend to mitmproxy may speed things up.

I will try and get some benchmarks to highlight the improvements.

wkeeling commented 3 years ago

The core of Selenium Wire has been reworked since this issue was reported. It uses a better performing proxy and relies on asyncio rather than threads. There are also ways a number of ways to speed up Selenium Wire if it's running slowly with certain sites. Closing this issue for now.