wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.89k stars 249 forks source link

Hey I am trying to speed up selenium wire as much as possible. I have tried several different options but needed some help. #196

Closed EnzoArata closed 3 years ago

EnzoArata commented 3 years ago

The main reason I am using selenium wire is so that I can easily login in to proxies that require username/password. However certain pages load a lot slower than others. I was wondering if there was a way I could disable everything except the proxy set up so that it can run faster.

I have tried the following, 1.Ignoring HTTP methods in the options ('GET', 'POST', 'HEAD', 'OPTIONS') 2.Setting disable encoding option to True 3.Setting verify ssl option to False 4.Setting page load strategy to none and stopping the window load when correct elements are loaded

I have tried headless and headed and get the same results. Using selenium on its own it runs much faster, I know this is because it isn't handling all the requests like selenium wire is. Unfortunately I haven't found any other way to use proxy authentication headless.

I am working on my Cybersecurity final project and any help would be much appreciated! Selenium Wire is awesome and I respect all the hard work you have put into it.

wkeeling commented 3 years ago

There's a few more things you can try to improve performance:

Also, you may be best to leave the disable_encoding option as False, as encoded (compressed) pages are likely to download more quickly.

Try playing around with some of the above and see how you get on. I'll be interested to hear which options make/do not make a difference. I'm currently in the process of trying to improve the performance of Selenium Wire further, so any info you come back with will be useful. Many thanks!

pawanpaudel93 commented 3 years ago

@EnzoArata If your only purpose of using selenium-wire is not to intercept requests/responses but just to use the proxy using authentication, then I have another solution for you. You can use the extension with chrome for proxy and use Xvfb for virtual display to run chrome in the server and run it on the server without problems. With selenium-wire it will be a little slower as requests/responses are to be intercepted.

def set_proxy(PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS, options):
    manifest_json = """
    {
        "version": "1.0.0",
        "manifest_version": 2,
        "name": "Chrome Proxy",
        "permissions": [
            "proxy",
            "tabs",
            "unlimitedStorage",
            "storage",
            "<all_urls>",
            "webRequest",
            "webRequestBlocking"
        ],
        "background": {
            "scripts": ["background.js"]
        },
        "minimum_chrome_version":"22.0.0"
    }
    """

    background_js = """
    var config = {
            mode: "fixed_servers",
            rules: {
            singleProxy: {
                scheme: "http",
                host: "%s",
                port: parseInt(%s)
            },
            bypassList: ["localhost"]
            }
        };
    chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
    function callbackFn(details) {
        return {
            authCredentials: {
                username: "%s",
                password: "%s"
            }
        };
    }
    chrome.webRequest.onAuthRequired.addListener(
                callbackFn,
                {urls: ["<all_urls>"]},
                ['blocking']
    );
    """ % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)
    pluginfile = 'proxy_auth_plugin.zip'

    with zipfile.ZipFile(pluginfile, 'w') as zp:
        zp.writestr("manifest.json", manifest_json)
        zp.writestr("background.js", background_js)
    options.add_extension(pluginfile)
    return options

Modify this as per your need. Install Xvfb (apt-get install Xvfb) and https://github.com/cgoldberg/xvfbwrapper. If you have any problem then I am here to help you...

EnzoArata commented 3 years ago

Thanks for the fast response @wkeeling ! I tried to use an interceptor to abort some requests to make it load faster but it doesn't seem like its working. If I understand correctly images shouldn't load. Does this look right to you?

I defined my interceptor similar to your example, but when I go to a page with an image it is still being loaded.

    # Block Several requests
    if request.path.endswith(('.png', '.jpg', '.gif', '.woff', '.php')):
        request.abort()
driver.request_interceptor = interceptor

I tried it on a Wikipedia page and the image is still there. It loads like the attached image. Would setting the page load strategy to none be effecting this? Capture

Also a question about excluding hosts, because it bypasses the upstream proxy does that mean I cant send button.clicks()/SendKeys() to the host? Thanks again for the info, I really appreciate you taking the time. I will try the mitmproxy backend and let you know.

-Edit: I tested on some other websites and it seems to work for stopping the loading of images!

EnzoArata commented 3 years ago

@pawanpaudel93 Thank you for the advice! I actually already used that method for proxy authentication, the only issue is I want to do headed and headless testing. I have tried this method headless but it doesnt work and I believe that it is because chrome headless does not allow extensions.

Do you know of any other methods to authenticate proxy easily headless?

Thanks for the assistance, I appreciate the help.

pawanpaudel93 commented 3 years ago

@EnzoArata with the method I recommended above it works in the server as if it's headless but don't enable headless mode as chrome extension will not work in headless mode. So there is (Xvfb and its python package xvfbwrapper) for the virtual display so we can run the chrome as if the chrome is running in headless mode.

EnzoArata commented 3 years ago

@pawanpaudel93 If I run it this way how will it effect performance? Will it be similar to running it headed or headless?

pawanpaudel93 commented 3 years ago

@EnzoArata You mean performance as in loading the pages ?? Pages will load better than using selenium-wire if your sole purpose is only proxy authentication. It's just running chrome over a virtual display so you can run without the virtual display (headed mode) and also run as if headless ( as the server does not have a display so xvfb will provide the virtual display to run it.)

EnzoArata commented 3 years ago

@pawanpaudel93 Yes loading performance. The only issue I am having now is it seems like xvfbwrapper only runs on linux and wasnt meant to run on windows. At the moment I am working on windows.

pawanpaudel93 commented 3 years ago

@EnzoArata Yes loading performance is better with that. Yes, xvfb is for linux systems. I don't have solutions for chrome on windows. Selenium-wire is currently slow with the default backend so try the mitmbackend with mitm_ignore_hosts option so ignore all the hosts as you don't want to intercept any requests. You can get info about the option here https://docs.mitmproxy.org/stable/concepts-options/

EnzoArata commented 3 years ago

@pawanpaudel93 I have switched to using mitmbackend and it seems to have improved performance. I am now running into a new issue, I have switched my code functionality so that now I launch a selenium webdriver through a thread. The only issue is that when I attempt to do this I get an error "RuntimeError: There is no current event loop in thread 'Thread-2'." If I switch off mitmbackend I no longer get this issue. Any thoughts?

pawanpaudel93 commented 3 years ago

@EnzoArata I don't have that problem when I use concurrent.futures.ProcessPoolExecutor but when I use pebble.ProcessPool I get the same error as you have been getting. So I changed few lines of code to fix it temporarily as in the image. What may be the permanent solution @wkeeling. You can install the modified selenium-wire with that lines of code: pip install git+https://github.com/pawanpaudel93/selenium-wire.git or you can modify the lines youself.

image

wkeeling commented 3 years ago

@pawanpaudel93 looks like an exception is getting thrown with pebble.ProcesssPool. Are you able to log the exception in the except so we can see the details? e.g.

except:
    import traceback
    traceback.print_exc()
    self._event_loop = asyncio.new_event_loop()
    asyncio.set_event_loop(self._event_loop)
wkeeling commented 3 years ago

Ignore me, I missed the previous post that said it was a RuntimeError. I'll look at improving the code around getting the event loop.

EnzoArata commented 3 years ago

@wkeeling Hey I had another question. I have created an executable file that is able to create threads that start a selenium wire webdriver. Every time I start one of these threads a new console for selenium webdriver opens up. I want to stop these consoles from opening so that it is hidden. I have tried the method below but it is throwing an error. Do you know how I could replicate this in selenium wire?

image

wkeeling commented 3 years ago

I think this is a version issue. Selenium Wire will simply pass the service argument up to to the __init__() method in Selenium's webdriver class, but the version of Selenium you're running doesn't yet support the service argument. Looking at the version history, the service argument was introduced in version selenium-4.0.0-alpha-1 but the current version of Selenium on pypi is 3.141.0. Maybe try using selenium-4.0.0-alpha-1 and see whether that resolves the issue?

EnzoArata commented 3 years ago

@wkeeling After updating to selenium-4.0.0-alpha-1 I am no longer seeing that console window. However for 5-10 seconds I get a random assortment of selenium wire windows rapidly opening and closing. These windows will appear a bit overlapped and they flash rapidly before closing. They close really fast but I recorded my screen and was able to see they are all titled "seleniumwire/proxy/win/openssl.exe". Any idea how I can get those to stop showing up?

Note- When I run my file as a .py I dont get this issue, but when I build to a .exe and try and run I am observing this issue.

wkeeling commented 3 years ago

The latest version of selenium wire (4.0.2) may resolve this because it no longer uses openssl.exe. Are you able to upgrade?

pip install selenium-wire --upgrade
EnzoArata commented 3 years ago

@wkeeling I was able to update selenium wire, I am making a new build and will let you know how it goes.

Thanks again for all the help, I really appreciate it. It has been awesome being able to calibrate and get some much needed assistance.

EDIT- Just tested. Seems to be even worse with the new version, I will do more testing tomorrow

wkeeling commented 3 years ago

@EnzoArata just wondering how you are getting on, and also to let you know there's a new version available which fixes some newly reported issues.

wkeeling commented 3 years ago

Version 4.2.0 allows you to disable capture and interception using the disable_capture option. When this option is set, Selenium Wire will pass all traffic through to the upstream proxy without decrypting/intercepting/capturing - which should boost performance.