Closed basnetsoyuj closed 2 years ago
Thanks for raising this. I'll look into the issue with .quit(). With the reassign of the proxy, does it work if you use the old-style way of reassigning:
self.driver.proxy._master.options.update(
mode=f"upstream:http://{server}:{port}",
upstream_auth=f"{username}:{password}"
)
I just tried the method you specified and it still doesn't work.
The issue with quit()
was that it called the shutdown()
method of seleniumwire.server.MitmProxy object
but the proxy
attribute was resigned to a dict
which does not have a shutdown()
method.
OK thanks. Are you able to share the traceback you're seeing when quit()
is called?
This is the traceback when i first reset proxy using driver.proxy = {...}
and use driver.quit()
:
Traceback (most recent call last):
File "venv\lib\site-packages\seleniumwire\webdriver.py", line 45, in quit
self.proxy.shutdown()
AttributeError: 'dict' object has no attribute 'shutdown'
Thanks. Can you confirm what version you're running? The issue with quit()
should be fixed in the latest version (4.5.1)
Hey, I was running selenium-wire==4.3.1
. Just tested it out, driver.quit()
works on the newest version, however changing proxy dynamically using driver.proxy = {...}
is still not possible.
I see that you changed driver.proxy
to a dict
so I could not test this:
driver.proxy._master.options.update(
mode=f"upstream:http://{new_ip.ip}:{new_ip.port}",
upstream_auth=f"{new_ip.username}:{new_ip.password}"
)
It gave the following error:
Traceback (most recent call last):
File "test.py", line 36, in <module>
driver.proxy._master.options.update(
AttributeError: 'dict' object has no attribute '_master'
Ah yes, the proxy
attribute has now changed to backend
- so the following should work:
driver.backend._master.options.update(
mode=f"upstream:http://{new_ip.ip}:{new_ip.port}",
upstream_auth=f"{new_ip.username}:{new_ip.password}"
)
Using backend._master.options
gave AttributeError: 'MitmProxy' object has no attribute '_master'
error. I used backend.master.options
instead, which worked but the proxy did not change. I'm testing my IP using this website http://www.myipaddress.com/show-my-ip-address/
Same problem here. Only can change the first time. Afterwards the proxy modifications are not taken into account.
@tjhgit thanks for the additional info.
One question: if you wait for all requests to complete before switching the proxy - e.g. add a time.sleep() before making the proxy change - do you still see the problem?
@wkeeling unfortunately no. A time.sleep() does not change anything in this behaviour.
same issue first proxy response next request no
@wkeeling I made a small script to reproduce that issue against seleniumwire 4.5.2, the reported bug happened.
from seleniumwire.webdriver import Chrome as SeleniumwireChrome, ChromeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as Wait
PROXIES = (
'YOUR_PROXIES_HERE',
)
def clear_session(driver):
driver.execute_script('window.localStorage.clear(); window.sessionStorage.clear();')
driver.delete_all_cookies()
driver.get('about:blank')
options = ChromeOptions()
options.headless = True
driver = SeleniumwireChrome(executable_path='./chromedriver', options=options, seleniumwire_options={'proxy': {}})
try:
print('Getting your real IP...')
driver.get("https://www.myip.com/")
ip = Wait(driver, 15, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ip")))
print(f" Your real IP: {ip.text.strip()}")
clear_session(driver)
for proxy in PROXIES:
driver.proxy = {"http": proxy}
print(f"Trying proxy {proxy}...")
driver.get("https://www.myip.com/")
ip = Wait(driver, 15, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ip")))
print(f" Your new IP: {ip.text.strip()}")
clear_session(driver)
driver.get("https://www.myip.com/")
ip = Wait(driver, 15, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ip")))
print(f" Your new IP after reload: {ip.text.strip()}")
clear_session(driver)
input("Press any key to continue...")
finally:
driver.quit()
I got the following (with IPs replaced by placeholders)
Getting your real IP...
Your real IP: REAL_IP
Trying proxy PROXY1...
Your new IP: REAL_IP
Your new IP after reload: REAL_IP
Press any key to continue...
Trying proxy PROXY2...
Your new IP: REAL_IP
Your new IP after reload: NEW_IP
Press any key to continue...
So my IP changed only at last attempt. Sometimes does not changes at all. Maybe some concurrency bug?
That's excellent, thanks @HMaker
It does smell like a concurrency issue as you mention. I'll look at reproducing using your example and see if I can figure out what's going on.
Hi @wkeeling , could you solve the issue? Due to this bug it is necessary to open and quit the driver for each single request if one wants to use the benefits of a rotating proxy. Or do you see alternatives here using selenium directly? Is it also easily possible to define a proxy with selenium alone?
Recently I deployed a scraping docker container (called: vscode-remote-svc) together with selenium hub (called: chrome) and this bug leads to additional headaches, since the instantiation and quiting of the process sometimes leads to an error that the proxy server port is already allocated.
For instance I am using this code snippet:
self.wire_options.update({
'auto_config': False, # Ensure this is set to False
'addr': '0.0.0.0', # The address the proxy will listen on
'port': 8087,
})
# default values
# comment the following proxy lines of code to test without rotating proxy
if proxy is not None:
self.wire_options.update({
'proxy': {
'https': self.pxy.urls['https'],
'http': self.pxy.urls['http']
}
})
self.options = webdriver.ChromeOptions()
self.options.add_argument('--proxy-server=vscode-remote-svc:8087')
self.driver = webdriver.Remote(http://chrome:4444/wd/hub,
options=self.options,
# desired_capabilities=self.options.to_capabilities(),
seleniumwire_options=self.wire_options)
self.driver.implicitly_wait(30)
self.driver.request_interceptor = self.interceptor
This works marvellously were it not for the current bug.
@tjhgit I've not yet been able to resolve this one. You can specify a proxy using Selenium directly but that mechanism doesn't support authenticated proxies, so no good if your proxy requires a username and password.
Perhaps you could try running your code above but rather than use a fixed port number, try and select a free one dynamically. That may resolve the occasional issue you see with the port being in use. For example:
import socket
from contextlib import closing
def get_free_port():
for port in range(8087, 8187):
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
if sock.connect_ex(('0.0.0.0', port)) != 0:
return port
port = get_free_port() # Get an available port
self.wire_options.update({
'auto_config': False, # Ensure this is set to False
'addr': '0.0.0.0', # The address the proxy will listen on
'port': port,
})
# default values
# comment the following proxy lines of code to test without rotating proxy
if proxy is not None:
self.wire_options.update({
'proxy': {
'https': self.pxy.urls['https'],
'http': self.pxy.urls['http']
}
})
self.options = webdriver.ChromeOptions()
self.options.add_argument(f'--proxy-server=vscode-remote-svc:{port}')
self.driver = webdriver.Remote(http://chrome:4444/wd/hub,
options=self.options,
# desired_capabilities=self.options.to_capabilities(),
seleniumwire_options=self.wire_options)
self.driver.implicitly_wait(30)
self.driver.request_interceptor = self.interceptor
@wkeeling marvellous idea - thanks so much. Now I do not get the proxy server port in use error anymore.
@tjhgit thanks for the additional info.
One question: if you wait for all requests to complete before switching the proxy - e.g. add a time.sleep() before making the proxy change - do you still see the problem?
Waiting for 40 to 50s would solve the problem. The new proxy need around 50s to be effective. So it is like mitmproxy is not picking the new configuration directly after setting proxy attribut
A fix for this has been made in v4.6.3
I don't change api on the latest version of seleniumwire
v5.1.0 has the same problem. I used v5.1.0 and had to wait a very long time for the proxy IP to change. Does anyone know how to fix this
i Hvaing been waiting for the answer, i want to change the proxy and user agents without closing the brrowser because it takes time for restarting the browser. Is there any other option to change the proxy on different tab or on same tab but before driver.get()
I wanted a way to change proxies dynamically(rather than closing and opening a new browser instance) and I followed the steps as specified in the README.md file:
However, this does not seem to work.
I reassigned the proxy in the following format:
One more weird thing I noticed is that after I assign
driver.proxy = {...}
(to a dict), I can't close the browser instance using thequit()
method (which works before i reassign the value ofproxy
). It throws the following error: