wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.89k stars 249 forks source link

Cannot change proxy dynamically #377

Closed basnetsoyuj closed 2 years ago

basnetsoyuj commented 3 years ago

I wanted a way to change proxies dynamically(rather than closing and opening a new browser instance) and I followed the steps as specified in the README.md file:

driver.get(...)  # Using some initial proxy

# Change the proxy
driver.proxy = {
    'https': 'https://user:pass@192.168.10.100:8888',
}

driver.get(...)  # These requests will use the new proxy

However, this does not seem to work.

I reassigned the proxy in the following format:

driver.proxy = {'http': 'http://username:password@ip:port', 'https': 'https://username:password@ip:port', 'no_proxy': 'localhost,127.0.0.1'}

One more weird thing I noticed is that after I assigndriver.proxy = {...} (to a dict), I can't close the browser instance using the quit() method (which works before i reassign the value of proxy). It throws the following error:

  File "venv\lib\site-packages\seleniumwire\webdriver.py", line 45, in quit
    self.proxy.shutdown()
AttributeError: 'dict' object has no attribute 'shutdown'
wkeeling commented 3 years ago

Thanks for raising this. I'll look into the issue with .quit(). With the reassign of the proxy, does it work if you use the old-style way of reassigning:

self.driver.proxy._master.options.update(
    mode=f"upstream:http://{server}:{port}",
    upstream_auth=f"{username}:{password}"
)
basnetsoyuj commented 3 years ago

I just tried the method you specified and it still doesn't work.

The issue with quit() was that it called the shutdown() method of seleniumwire.server.MitmProxy object but the proxy attribute was resigned to a dict which does not have a shutdown() method.

wkeeling commented 3 years ago

OK thanks. Are you able to share the traceback you're seeing when quit() is called?

basnetsoyuj commented 3 years ago

This is the traceback when i first reset proxy using driver.proxy = {...} and use driver.quit():

Traceback (most recent call last):
  File "venv\lib\site-packages\seleniumwire\webdriver.py", line 45, in quit
    self.proxy.shutdown()
AttributeError: 'dict' object has no attribute 'shutdown'
wkeeling commented 3 years ago

Thanks. Can you confirm what version you're running? The issue with quit() should be fixed in the latest version (4.5.1)

basnetsoyuj commented 3 years ago

Hey, I was running selenium-wire==4.3.1. Just tested it out, driver.quit() works on the newest version, however changing proxy dynamically using driver.proxy = {...} is still not possible.

I see that you changed driver.proxy to a dict so I could not test this:

driver.proxy._master.options.update(
    mode=f"upstream:http://{new_ip.ip}:{new_ip.port}",
    upstream_auth=f"{new_ip.username}:{new_ip.password}"
)

It gave the following error:

Traceback (most recent call last):
  File "test.py", line 36, in <module>
    driver.proxy._master.options.update(
AttributeError: 'dict' object has no attribute '_master'
wkeeling commented 3 years ago

Ah yes, the proxy attribute has now changed to backend - so the following should work:

driver.backend._master.options.update(
    mode=f"upstream:http://{new_ip.ip}:{new_ip.port}",
    upstream_auth=f"{new_ip.username}:{new_ip.password}"
)
basnetsoyuj commented 3 years ago

Using backend._master.options gave AttributeError: 'MitmProxy' object has no attribute '_master' error. I used backend.master.options instead, which worked but the proxy did not change. I'm testing my IP using this website http://www.myipaddress.com/show-my-ip-address/

tjhgit commented 3 years ago

Same problem here. Only can change the first time. Afterwards the proxy modifications are not taken into account.

wkeeling commented 3 years ago

@tjhgit thanks for the additional info.

One question: if you wait for all requests to complete before switching the proxy - e.g. add a time.sleep() before making the proxy change - do you still see the problem?

tjhgit commented 3 years ago

@wkeeling unfortunately no. A time.sleep() does not change anything in this behaviour.

skndrvoip commented 2 years ago

same issue first proxy response next request no

HMaker commented 2 years ago

@wkeeling I made a small script to reproduce that issue against seleniumwire 4.5.2, the reported bug happened.

from seleniumwire.webdriver import Chrome as SeleniumwireChrome, ChromeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as Wait

PROXIES = (
    'YOUR_PROXIES_HERE',
)

def clear_session(driver):
    driver.execute_script('window.localStorage.clear(); window.sessionStorage.clear();')
    driver.delete_all_cookies()
    driver.get('about:blank')

options = ChromeOptions()
options.headless = True
driver = SeleniumwireChrome(executable_path='./chromedriver', options=options, seleniumwire_options={'proxy': {}})
try:
    print('Getting your real IP...')
    driver.get("https://www.myip.com/")
    ip = Wait(driver, 15, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ip")))
    print(f"    Your real IP: {ip.text.strip()}")
    clear_session(driver)
    for proxy in PROXIES:
        driver.proxy = {"http": proxy}
        print(f"Trying proxy {proxy}...")
        driver.get("https://www.myip.com/")
        ip = Wait(driver, 15, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ip")))
        print(f"    Your new IP: {ip.text.strip()}")
        clear_session(driver)
        driver.get("https://www.myip.com/")
        ip = Wait(driver, 15, 2).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#ip")))
        print(f"    Your new IP after reload: {ip.text.strip()}")
        clear_session(driver)
        input("Press any key to continue...")
finally:
    driver.quit()

I got the following (with IPs replaced by placeholders)

Getting your real IP...
    Your real IP: REAL_IP
Trying proxy PROXY1...
    Your new IP: REAL_IP
    Your new IP after reload: REAL_IP
Press any key to continue...
Trying proxy PROXY2...
    Your new IP: REAL_IP
    Your new IP after reload: NEW_IP
Press any key to continue...

So my IP changed only at last attempt. Sometimes does not changes at all. Maybe some concurrency bug?

wkeeling commented 2 years ago

That's excellent, thanks @HMaker

It does smell like a concurrency issue as you mention. I'll look at reproducing using your example and see if I can figure out what's going on.

tjhgit commented 2 years ago

Hi @wkeeling , could you solve the issue? Due to this bug it is necessary to open and quit the driver for each single request if one wants to use the benefits of a rotating proxy. Or do you see alternatives here using selenium directly? Is it also easily possible to define a proxy with selenium alone?

Recently I deployed a scraping docker container (called: vscode-remote-svc) together with selenium hub (called: chrome) and this bug leads to additional headaches, since the instantiation and quiting of the process sometimes leads to an error that the proxy server port is already allocated.

For instance I am using this code snippet:

self.wire_options.update({
            'auto_config': False,  # Ensure this is set to False
            'addr': '0.0.0.0',  # The address the proxy will listen on
            'port': 8087,
        })
# default values
# comment the following proxy lines of code to test without rotating proxy
if proxy is not None:
    self.wire_options.update({
        'proxy': {
            'https': self.pxy.urls['https'],
            'http': self.pxy.urls['http']
        }
    })

self.options = webdriver.ChromeOptions()
self.options.add_argument('--proxy-server=vscode-remote-svc:8087')

self.driver = webdriver.Remote(http://chrome:4444/wd/hub,
                                        options=self.options,
                                        # desired_capabilities=self.options.to_capabilities(),
                                        seleniumwire_options=self.wire_options)

self.driver.implicitly_wait(30)

self.driver.request_interceptor = self.interceptor

This works marvellously were it not for the current bug.

wkeeling commented 2 years ago

@tjhgit I've not yet been able to resolve this one. You can specify a proxy using Selenium directly but that mechanism doesn't support authenticated proxies, so no good if your proxy requires a username and password.

Perhaps you could try running your code above but rather than use a fixed port number, try and select a free one dynamically. That may resolve the occasional issue you see with the port being in use. For example:

import socket
from contextlib import closing

def get_free_port():
    for port in range(8087, 8187):
        with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
            if sock.connect_ex(('0.0.0.0', port)) != 0:
                return port

port = get_free_port()  # Get an available port

self.wire_options.update({
            'auto_config': False,  # Ensure this is set to False
            'addr': '0.0.0.0',  # The address the proxy will listen on
            'port': port,
        })
# default values
# comment the following proxy lines of code to test without rotating proxy
if proxy is not None:
    self.wire_options.update({
        'proxy': {
            'https': self.pxy.urls['https'],
            'http': self.pxy.urls['http']
        }
    })

self.options = webdriver.ChromeOptions()
self.options.add_argument(f'--proxy-server=vscode-remote-svc:{port}')

self.driver = webdriver.Remote(http://chrome:4444/wd/hub,
                                        options=self.options,
                                        # desired_capabilities=self.options.to_capabilities(),
                                        seleniumwire_options=self.wire_options)

self.driver.implicitly_wait(30)

self.driver.request_interceptor = self.interceptor
tjhgit commented 2 years ago

@wkeeling marvellous idea - thanks so much. Now I do not get the proxy server port in use error anymore.

mikroelektro commented 2 years ago

@tjhgit thanks for the additional info.

One question: if you wait for all requests to complete before switching the proxy - e.g. add a time.sleep() before making the proxy change - do you still see the problem?

Waiting for 40 to 50s would solve the problem. The new proxy need around 50s to be effective. So it is like mitmproxy is not picking the new configuration directly after setting proxy attribut

wkeeling commented 2 years ago

A fix for this has been made in v4.6.3

vladsuxunun commented 2 years ago

I don't change api on the latest version of seleniumwire

hengzhibs commented 1 year ago

v5.1.0 has the same problem. I used v5.1.0 and had to wait a very long time for the proxy IP to change. Does anyone know how to fix this

HammadRafique29 commented 1 year ago

i Hvaing been waiting for the answer, i want to change the proxy and user agents without closing the brrowser because it takes time for restarting the browser. Is there any other option to change the proxy on different tab or on same tab but before driver.get()