Closed nayanamana closed 3 years ago
Thanks for raising this. I'll see if I can reproduce the issue with your configuration and I'll let you know what I find.
I've attempted to reproduce this on Windows 10 using Firefox 69, version 0.24.0 of geckodriver and the latest version of Selenium Wire (1.0.9). On my machine, Selenium Wire was averaging 11 seconds to fully load https://www.cnn.com, whereas Selenium itself was averaging 3 seconds. The slower load time is to be expected due to the request/response capture that Selenium Wire performs (and the cnn.com homepage seems to trigger a particularly large number of requests for embedded resources, advertising etc.). However, I'm not not seeing the +2 minute response times that you are observing.
What kind of response time do you see if you run https://www.cnn.com directly through selenium? Also, is there anything in profile_ff_work
that could be affecting performance? Could you run the test with a barebones profile?
Thanks for looking into it. I use the same versions of geckodriver and seleniumwire (Firefox + Windows 10), and with a barebone profile, it still takes 1.5-2 minutes for cnn.com to load. However, with plain selenium it only takes 3 seconds to load. I initially thought the problem is with my environment but with plain selenium on same machine, it just takes 3 seconds to load cnn.com .
For your information, below is the script I use:
import os, sys import json, datetime, time
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary from selenium.webdriver.firefox.options import Options from selenium.webdriver.firefox.firefox_profile import FirefoxProfile from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from seleniumwire import webdriver
ff_install_path = r'C:\Program Files\Mozilla Firefox\firefox.exe' firefox_driver = 'C:\install\drivers\geckodriver-v0.24.0-win64\geckodriver.exe'
ff_binary = FirefoxBinary(ff_install_path)
executable_path = firefox_driver
driver =webdriver.Firefox(firefox_binary=ff_binary, executable_path=executable_path, \ seleniumwire_options={'verify_ssl': False})
print("START TIME: " + str(datetime.datetime.now())) driver.get('https://www.cnn.com')
print("END TIME: " + str(datetime.datetime.now()))
driver.quit()
Thanks for the update.
When Selenium Wire is loading the site, are you able to bring up Windows task manager and watch what's going on with the processes and CPU? It would be interesting to understand if there is a particular process that is consuming 100% CPU and causing the slow down you are seeing.
With Selenium wire it only consumes 65% CPU maximum. The process with the highest CPU usage is Firefox (13% on average). This is the same observation with plain Selenium.
Thanks. Could you also try disabling capture of GET and POST requests using the ignore_http_methods
option, for example:
driver = webdriver.Firefox(
firefox_binary=ff_binary,
executable_path=executable_path,
seleniumwire_options={'verify_ssl': False,
'ignore_http_methods': ['GET', 'POST', 'OPTIONS']}
)
That may give a clue as to whether the capture process is causing the problem.
Also, if you get a chance, could you try using a site that does not use https - for example http://web.mit.edu/ (or any others) and see how they behave?
Using the ignore_http_methods option still take that much of time for www.cnn.com . However for the HTTP site you mentioned takes only 2 seconds with Selenium-Wire.
Ok thanks. It sounds like the issue may be related to the underlying SSL interception, possibly something to do with openssl (openssl is bundled with the Windows version of Selenium Wire).
I think at this point we'd need to step through using a debugger to see which line of code is causing the problem. Would you be in a position to do that?
If you let me know the steps I can try..
On Mon, Sep 9, 2019, 12:09 PM Will Keeling notifications@github.com wrote:
Ok thanks. It sounds like the issue may be related to the underlying SSL interception, possibly something to do with openssl (openssl is bundled with the Windows version of Selenium Wire).
I think at this point we'd need to step through using a debugger to see which line of code is causing the problem. Would you be in a position to do that?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wkeeling/selenium-wire/issues/65?email_source=notifications&email_token=ADMTO3LCZDV6EOPP4IEOTXTQIZYKVA5CNFSM4IMPYL52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IFHHQ#issuecomment-529552286, or mute the thread https://github.com/notifications/unsubscribe-auth/ADMTO3OXOLY2YNRL6ZT473TQIZYKVANCNFSM4IMPYL5Q .
Are you comfortable with using a Python editor such as PyCharm?
I use Visual Studio. But can install pycharm
On Wed, Sep 11, 2019, 8:38 AM Will Keeling notifications@github.com wrote:
Are you comfortable with using a Python editor such as PyCharm?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wkeeling/selenium-wire/issues/65?email_source=notifications&email_token=ADMTO3I326H2EGJU4WOX6ITQJDRC5A5CNFSM4IMPYL52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OKWBQ#issuecomment-530361094, or mute the thread https://github.com/notifications/unsubscribe-auth/ADMTO3OI5LIZ3GQHQRGTETTQJDRC5ANCNFSM4IMPYL5Q .
Ok well for PyCharm the steps would basically be something like this:
git clone https://github.com/wkeeling/selenium-wire.git
. (You may first need to install Git if not already installed.)test_firefox_can_access_requests
, change the url to be https://www.cnn.com
Once the test completes, try setting a break point and then running again:
do_GET()
method on line 74. Do this by clicking on the left hand margin next to the line number and a red dot should appearReally appreciate your offer of help on this one. Getting to the bottom of the issue would really help, especially if it results in a fix. Let us know how you get on!
@wkeeling I can confirm selenium-wire is very slow on Windows 10. I need to increase the request timeout to be able to get the test passed for #69 on my windows 10 machine. I tried to debug and step over each of them but could not find anything suspicious.
I will try these steps.. but need some time as I am busy with another project..
On Wed, Sep 11, 2019 at 10:09 AM Will Keeling notifications@github.com wrote:
Ok well for PyCharm the steps would basically be something like this:
- Clone the repo with git clone https://github.com/wkeeling/selenium-wire.git
- Start PyCharm and open the project you just cloned with File > Open... > select the selenium-wire folder
- Navigate to tests > acceptance.py in left hand tree and double-click to edit it
- In the first test method test_firefox_can_access_requests, change the url to be https://www.cnn.com
- Right click the test method and select Run...
- The test should run and should reproduce the performance problem
Once the test completes, try setting a break point and then running again:
- In PyCharm, navigate to seleniumwire/proxy/proxy2.py in the left hand tree, double-click to open
- Try setting a break point just inside the do_GET() method on line
- Do this by clicking on the left hand margin next to the line number and a red dot should appear
- Go back to acceptance.py and right-click the test method again, but this time select 'Debug...'
- The test should run and should drop you onto the break point. From there you can use the Step Over button in the Debug panel at the bottom (or press F6) to step over each line of code. As you step over each line, you may find that one particular line causes the debugger to pause for a very long time. This may give some clues as to what's causing the problem.
Really appreciate your offer of help on this one. Getting to the bottom of the issue would really help, especially if it results in a fix. Let us know how you get on!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wkeeling/selenium-wire/issues/65?email_source=notifications&email_token=ADMTO3JIKKP6TWY7P5GPGUTQJD33HA5CNFSM4IMPYL52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6OTW2Q#issuecomment-530398058, or mute the thread https://github.com/notifications/unsubscribe-auth/ADMTO3MOJDRRBAH2CVRPJW3QJD33HANCNFSM4IMPYL5Q .
@idxn thanks - seems like it may be a general problem. Are you using any upstream proxy or is there anything special about your setup?
Thanks @nayanamana From what @idxn has tried it seems you may not find anything obvious, but if you do notice anything let us know,
@wkeeling No, I do not have an upstream proxy but I noticed that the target url return 301. The failed test I got is from python.org which return 301
cnn also return 302
I think it maybe the cause. I have randomly checked the url in the test_client and others seem to return only 200 without redirect.
@wkeeling Do you have any more suspect? What should we do to fix the issue?
@idxn thanks for looking into it. The redirect however doesn't seem to make any difference to performance on my Windows 10 machine, so I'm not sure that the redirect is the underlying cause. The tests run fine regardless.
I think the issue is probably something to do with the ssl connection wrapping because it seems that non-https sites (e.g. http://web.mit.edu/) run fine?
Without a local reproducible test case I can only guess what the issue might be. I have another older Windows 10 machine so I will see if I can reproduce the issue there. If not, I'm going to be relying on somebody else who does have the issue to do a bit of debugging on this one.
@wkeeling Could you please guide me which line or method you think it might be an issue? I'll try to look into it.
@idxn I think I would start by stepping through the go_GET()
function in proxy2.py
with the debugger, looking closely at the lines that deal with outbound requests e.g.
and seeing whether these lines are particularly slow to respond. It may end up being an unrelated line of code that's causing the issue. At this point it's going to need a bit of exploratory debugging unfortunately.
It is quite hard to debug. It happens sometimes but sometimes not :( Well, I will just post my python environment here then for you to replicate the issue. Python 3.7.0 Other package version pip_pkg.txt Will try again and keep you posted
Thanks. I'll also have another go at reproducing on a different machine.
The latest release of Selenium Wire (1.2.1) uses connection keep-alive by default. Previously Selenium Wire was creating new connections for each HTTP request - which was inefficient and was degrading performance.
@idxn @nayanamana You may have found a workaround/alternative solution by now, but if you're in a position to test version 1.2.1 on Windows 10 let me know how it goes.
One other thought I have: it's possible that Windows antivirus (e.g. Windows Defender) is intercepting the execution of openssl.exe
which gets run by Selenium Wire for SSL based sites. For sites that contain a lot of external assets, openssl.exe
will be run multiple times and all the antivirus checks could hugely increase the load time.
Assuming you're using an antivirus program, try adding an exception for openssl.exe
(you can do this in Windows Defender I think). I'd be interested to see whether that improves things. I'll also see if I can reproduce based on this this theory.
Selenium Wire no longer relies on openssl.exe, and the core of the library has been reworked to improve overall performance. Closing this issue.
I am using selenium-wire (version 1.0.8) on Windows 10, and it appears to be very slow. For example to complete the get request for https://www.cnn.com , it takes more than 2 minutes.
Do you know what could be the issue and how I can resolve it?
... from seleniumwire import webdriver # Import from seleniumwire .... profile = FirefoxProfile(profile_ff_work) profile.accept_untrusted_certs = True profile.assume_untrusted_cert_issuer = True profile.set_preference("app.normandy.startupRolloutPrefs.network.cookie.cookieBehavior", 0) firefox_driver = 'C:\install\drivers\geckodriver-v0.24.0-win64\geckodriver.exe' executable_path = firefox_driver driver =webdriver.Firefox(firefox_binary=ff_binary, executable_path=executable_path, firefox_profile=profile) driver.get('https://www.cnn.com')
for request in driver.requests: if request.response: print( request.path, request.response.status_code, request.response.headers )