Closed EnzoArata closed 3 years ago
There's a few more things you can try to improve performance:
exclude_hosts
option. Requests to these hosts will also bypass your upstream proxy. Can be useful for sending advertising/analytics requests/etc around Selenium Wire.Also, you may be best to leave the disable_encoding
option as False, as encoded (compressed) pages are likely to download more quickly.
Try playing around with some of the above and see how you get on. I'll be interested to hear which options make/do not make a difference. I'm currently in the process of trying to improve the performance of Selenium Wire further, so any info you come back with will be useful. Many thanks!
@EnzoArata If your only purpose of using selenium-wire is not to intercept requests/responses but just to use the proxy using authentication, then I have another solution for you. You can use the extension with chrome for proxy and use Xvfb for virtual display to run chrome in the server and run it on the server without problems. With selenium-wire it will be a little slower as requests/responses are to be intercepted.
def set_proxy(PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS, options):
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = """
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%s",
port: parseInt(%s)
},
bypassList: ["localhost"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%s",
password: "%s"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
options.add_extension(pluginfile)
return options
Modify this as per your need. Install Xvfb (apt-get install Xvfb) and https://github.com/cgoldberg/xvfbwrapper. If you have any problem then I am here to help you...
Thanks for the fast response @wkeeling ! I tried to use an interceptor to abort some requests to make it load faster but it doesn't seem like its working. If I understand correctly images shouldn't load. Does this look right to you?
I defined my interceptor similar to your example, but when I go to a page with an image it is still being loaded.
# Block Several requests
if request.path.endswith(('.png', '.jpg', '.gif', '.woff', '.php')):
request.abort()
driver.request_interceptor = interceptor
I tried it on a Wikipedia page and the image is still there. It loads like the attached image. Would setting the page load strategy to none be effecting this?
Also a question about excluding hosts, because it bypasses the upstream proxy does that mean I cant send button.clicks()/SendKeys() to the host? Thanks again for the info, I really appreciate you taking the time. I will try the mitmproxy backend and let you know.
-Edit: I tested on some other websites and it seems to work for stopping the loading of images!
@pawanpaudel93 Thank you for the advice! I actually already used that method for proxy authentication, the only issue is I want to do headed and headless testing. I have tried this method headless but it doesnt work and I believe that it is because chrome headless does not allow extensions.
Do you know of any other methods to authenticate proxy easily headless?
Thanks for the assistance, I appreciate the help.
@EnzoArata with the method I recommended above it works in the server as if it's headless but don't enable headless mode as chrome extension will not work in headless mode. So there is (Xvfb and its python package xvfbwrapper) for the virtual display so we can run the chrome as if the chrome is running in headless mode.
@pawanpaudel93 If I run it this way how will it effect performance? Will it be similar to running it headed or headless?
@EnzoArata You mean performance as in loading the pages ?? Pages will load better than using selenium-wire if your sole purpose is only proxy authentication. It's just running chrome over a virtual display so you can run without the virtual display (headed mode) and also run as if headless ( as the server does not have a display so xvfb will provide the virtual display to run it.)
@pawanpaudel93 Yes loading performance. The only issue I am having now is it seems like xvfbwrapper only runs on linux and wasnt meant to run on windows. At the moment I am working on windows.
@EnzoArata Yes loading performance is better with that. Yes, xvfb is for linux systems. I don't have solutions for chrome on windows. Selenium-wire is currently slow with the default backend so try the mitmbackend with mitm_ignore_hosts option so ignore all the hosts as you don't want to intercept any requests. You can get info about the option here https://docs.mitmproxy.org/stable/concepts-options/
@pawanpaudel93 I have switched to using mitmbackend and it seems to have improved performance. I am now running into a new issue, I have switched my code functionality so that now I launch a selenium webdriver through a thread. The only issue is that when I attempt to do this I get an error "RuntimeError: There is no current event loop in thread 'Thread-2'." If I switch off mitmbackend I no longer get this issue. Any thoughts?
@EnzoArata I don't have that problem when I use concurrent.futures.ProcessPoolExecutor but when I use pebble.ProcessPool I get the same error as you have been getting. So I changed few lines of code to fix it temporarily as in the image. What may be the permanent solution @wkeeling. You can install the modified selenium-wire with that lines of code: pip install git+https://github.com/pawanpaudel93/selenium-wire.git
or you can modify the lines youself.
@pawanpaudel93 looks like an exception is getting thrown with pebble.ProcesssPool
. Are you able to log the exception in the except
so we can see the details? e.g.
except:
import traceback
traceback.print_exc()
self._event_loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._event_loop)
Ignore me, I missed the previous post that said it was a RuntimeError
. I'll look at improving the code around getting the event loop.
@wkeeling Hey I had another question. I have created an executable file that is able to create threads that start a selenium wire webdriver. Every time I start one of these threads a new console for selenium webdriver opens up. I want to stop these consoles from opening so that it is hidden. I have tried the method below but it is throwing an error. Do you know how I could replicate this in selenium wire?
I think this is a version issue. Selenium Wire will simply pass the service
argument up to to the __init__()
method in Selenium's webdriver class, but the version of Selenium you're running doesn't yet support the service
argument. Looking at the version history, the service
argument was introduced in version selenium-4.0.0-alpha-1
but the current version of Selenium on pypi is 3.141.0
. Maybe try using selenium-4.0.0-alpha-1
and see whether that resolves the issue?
@wkeeling After updating to selenium-4.0.0-alpha-1 I am no longer seeing that console window. However for 5-10 seconds I get a random assortment of selenium wire windows rapidly opening and closing. These windows will appear a bit overlapped and they flash rapidly before closing. They close really fast but I recorded my screen and was able to see they are all titled "seleniumwire/proxy/win/openssl.exe". Any idea how I can get those to stop showing up?
Note- When I run my file as a .py I dont get this issue, but when I build to a .exe and try and run I am observing this issue.
The latest version of selenium wire (4.0.2) may resolve this because it no longer uses openssl.exe. Are you able to upgrade?
pip install selenium-wire --upgrade
@wkeeling I was able to update selenium wire, I am making a new build and will let you know how it goes.
Thanks again for all the help, I really appreciate it. It has been awesome being able to calibrate and get some much needed assistance.
EDIT- Just tested. Seems to be even worse with the new version, I will do more testing tomorrow
@EnzoArata just wondering how you are getting on, and also to let you know there's a new version available which fixes some newly reported issues.
Version 4.2.0 allows you to disable capture and interception using the disable_capture
option. When this option is set, Selenium Wire will pass all traffic through to the upstream proxy without decrypting/intercepting/capturing - which should boost performance.
The main reason I am using selenium wire is so that I can easily login in to proxies that require username/password. However certain pages load a lot slower than others. I was wondering if there was a way I could disable everything except the proxy set up so that it can run faster.
I have tried the following, 1.Ignoring HTTP methods in the options ('GET', 'POST', 'HEAD', 'OPTIONS') 2.Setting disable encoding option to True 3.Setting verify ssl option to False 4.Setting page load strategy to none and stopping the window load when correct elements are loaded
I have tried headless and headed and get the same results. Using selenium on its own it runs much faster, I know this is because it isn't handling all the requests like selenium wire is. Unfortunately I haven't found any other way to use proxy authentication headless.
I am working on my Cybersecurity final project and any help would be much appreciated! Selenium Wire is awesome and I respect all the hard work you have put into it.