wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 254 forks source link

Remote Webdriver seems not connect to Proxy with authentication #424

Open vinhliem opened 2 years ago

vinhliem commented 2 years ago

I followed the instruction from the main page of selenium-wire, both webdriver.Chrome and webdriver.Firefox works fine with proxy declaration on seleniumwire_options at my local machine with:

opt = {
    'proxy': {
        'http': 'http://<user>:<password>@exampleproxy:3128',
        'https': 'https://<user>:<password>@exampleproxy:3128'
    }
}

but when i was trying with webdriver.Remote with full options set up like this

option = webdriver.ChromeOptions()
option.add_argument("enable-automation")
option.add_argument("--headless")
option.add_argument("--window-size=1366,768")
option.add_argument("--no-sandbox")
option.add_argument("--disable-extensions")
option.add_argument("--dns-prefetch-disable")
option.add_argument("--ignore-certificate-errors")
option.add_argument("--disable-gpu")
opt = {
    'auto_config': False,
    'addr': '0.0.0.0',
    'proxy': {
        'http': 'http://<user>:<password>@exampleproxy:3128',
        'https': 'https://<user>:<password>@exampleproxy:3128'
    }
}
option.add_argument("--proxy-server={}".format('exampleproxy:3128'))

#####THIS USING CHROMEDRIVER AT LOCAL MACHINE ########
#driver = webdriver.Chrome('/home/tranvinhliem/selenium-grid/chromedriver_linux64/chromedriver', desired_capabilities=option.to_capabilities(), seleniumwire_options=opt)

#####THIS USING REMOTE CHROME AT REMOTE SERVER ##########
driver = webdriver.Remote(command_executor='https://remoteseleniumhub/wd/hub', desired_capabilities=option.to_capabilities(), seleniumwire_options=opt)

auto_config when i set to True it occurred the error like this but just for the remote selenium hub

selenium.common.exceptions.WebDriverException: Message: unknown error: net::ERR_PROXY_CONNECTION_FAILED

and it seems not passing connection to proxy anymore when auto_config was set False then webdriver establish directly a connection point to website (not through out proxy), the reason i know that because the IP i checked on this connection from node chrome of selenium hub itself not IP from proxy's server, it occurred for both Remote, Chrome or Firefox. Would i miss somethings ?

wkeeling commented 2 years ago

Thanks for raising this.

If you're running a remote webdriver instance then you have to use the --proxy-server argument to point Chrome back at Selenium Wire. Selenium Wire captures the requests before forwarding them to the upstream proxy specified in the proxy setting in the seleniumwire_options.

option.add_argument("--proxy-server={}".format('machine_running_selenium_wire:3128'))

This setup will only work if the machine that is running Selenium Wire is accessible from the machine running the Chrome browser. I'm assuming that your local machine is probably not accessible from Selenium Hub?

vinhliem commented 2 years ago

To make it clear, i have one selenium hub located at US, my proxy which let selenium hub used to get through in Singapore and my local machine which i used for test at Vietnam, following an instructions i set up options for selenium wire which auto_config:false and proxy declaration is my proxy and your suggestion about --proxy-server for

option.add_argument("--proxy-server={}".format('local-machine-public-ip:3128'))

should be my ip/domain of local machine, do it right?

wkeeling commented 2 years ago

Yes that should be the ip/domain of your local machine running Selenium Wire. The browser running in Selenium Hub will then send traffic back to Selenium Wire for capture.

vinhliem commented 2 years ago

@wkeeling Thanks for pointing out, i will do a test then confirm later

sanjeevtrz commented 2 years ago

@vinhliem Did it work for you? I have similar setup but doesn't work.

vinhliem commented 2 years ago

I did try, but seems not work at my side. The main idea is that we need neutral node between our proxy and hub which available on both sides to let hub and proxy working. If you set up your hub at local machine then set up configuration file to let connection go through localhost then it should work. But in my case, i didn't want to grant permission to access my proxies whenever they change IP, but suppose we can make a hub connect to proxy successful by password authentication we also have to grant or make neutral proxies available on both side as well, so i think it not solve my problem in here @sanjeevtrz

vinhliem commented 2 years ago

Thanks for raising this.

If you're running a remote webdriver instance then you have to use the --proxy-server argument to point Chrome back at Selenium Wire. Selenium Wire captures the requests before forwarding them to the upstream proxy specified in the proxy setting in the seleniumwire_options.

option.add_argument("--proxy-server={}".format('machine_running_selenium_wire:3128'))

This setup will only work if the machine that is running Selenium Wire is accessible from the machine running the Chrome browser. I'm assuming that your local machine is probably not accessible from Selenium Hub?

You can see @wkeeling show how Selenium Wire work here @sanjeevtrz

sanjeevtrz commented 2 years ago

Thanks @vinhliem. Am using similar configuration. Am running selenium grid, node(all in local docker container) and selenium wire in local machine.

   wire_options = {        
        'proxy': {
            'http': f'http://{username}:{password}@{host}:{port}',
            'https': f'httpS://{username}:{password}@{host}:{port}'
        },
        'suppress_connection_errors': False,
        'auto_config': False,
        'addr': '0.0.0.0',
        'port': 8087,
    }
    options = webdriver.ChromeOptions()

    options.add_argument("--proxy-server=localhost:8087")
    options.add_argument('--ignore-certificate-errors')
    proxy_driver = wiredriver.Remote(
        command_executor='http://localhost:4444/wd/hub',
        options=options,
        seleniumwire_options=wire_options)
   proxy_driver.get("https://api.ipify.org?format=json")

Failure is same..

Message: unknown error: net::ERR_PROXY_CONNECTION_FAILED

vinhliem commented 2 years ago

In my opinion, if you set up your grid by pod (container) on your k8s then talking with proxies and returning back to selenium wire your local machine, so you have to expose grid service in order to communicate purpose

opt = {
    'auto_config': False,
    'addr': '0.0.0.0',
    'proxy': {
        'http': 'http://<user>:<password>@exampleproxy:3128',
        'https': 'https://<user>:<password>@exampleproxy:3128'
    }
}
option.add_argument("--proxy-server={}".format('machine_running_selenium_wire:3128'))

I'm not sure but i think proxy setting in here never be localhost, it should be a machine run selenium wire out of your local network

then your local machine must need to have a right to access a grid which run selenium wire to let it work. @sanjeevtrz

sanjeevtrz commented 2 years ago

Thanks, container was not able to access selenium wire when I put localhost. I need to access the selenium wire running on host machine. In my case it worked my I put right host like options.add_argument(f"--proxy-server=host.docker.internal:8098")

h4r5h1t commented 2 years ago

@sanjeevtrz @wkeeling @vinhliem Hey guys I also want to run selenium-wire with selenium grid, node(all in a local docker container) but I don't want to run selenium wire in a local machine I want to be inside my local docker so that node and hub can communicate with selenium-wire so is there is any configuration for that?? if yes then can you please share the docker-compose file and a small eg how to config it for creating a remote webdriver?

mirisr commented 2 years ago

@h4r5h1t-hrs Did you ever figure this out?

sanjeevtrz commented 2 years ago

@sanjeevtrz @wkeeling @vinhliem Hey guys I also want to run selenium-wire with selenium grid, node(all in a local docker container) but I don't want to run selenium wire in a local machine I want to be inside my local docker so that node and hub can communicate with selenium-wire so is there is any configuration for that?? if yes then can you please share the docker-compose file and a small eg how to config it for creating a remote webdriver?

You need to use right host names. In my case grid was running on local machine and my script inside docker container, so I had to use above custom host name which is reserved to host machine.

In your case simple if you are communicating between two Dockers use docker compose to launch and use host name you mention in host attribute.

mirisr commented 2 years ago

So if I'm running my script in a docker image inside a google compute engine... and the external ip for the compute engine is 35.206.XX.XXX.

Then should I have my Firefox (or Chrome) options say:

options.add_argument("--proxy-server={}".format('35.206.XX.XXXX:4444'))

and my selenium-wire options say:

options = {
            'auto_config': False,
            'proxy': {
                'http': 'http://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]), 
                'https': 'https://'+username+':'+password+'@'+ip_assignment["ip-address"]+':'+str(ip_assignment["port"]),
                'no_proxy': 'localhost,127.0.0.1' # excludes
            },
            'addr': ''35.206.XX.XXXX',
            'port': 4444
        }

With my remote driver using a node in the selenium grid.

driver = webdriver.Remote(
                command_executor=selenium_connection,  # for prod
                desired_capabilities=DesiredCapabilities.FIREFOX,
                options=options,
                keep_alive=True,
                browser_profile=firefox_profile, 
                seleniumwire_options=selenium_options
            )

Is this the right setup?

sanjeevtrz commented 2 years ago

When use port 4444 are you referring to grid port?

You are right with configuration. Make sure you can ping script external url from grid container and back n forth.

mirisr commented 2 years ago

It turns out the 4444 is my proxy port and I suppose the grid port uses it too.

What do you mean with your last comment? How can I do that?

sanjeevtrz commented 2 years ago

So when you set selenium-wire (script) URL as proxy and start session on grid, browser will use selenium-wire mitmproxy as proxy url. So it is important that you set machine running script has public IP or private IP which can be access by Machine running the grid.

Akash-nykaa commented 2 years ago

Remote

Selenium Wire has limited support for using the remote webdriver client. When you create an instance of the remote webdriver, you need to specify the hostname or IP address of the machine (or container) running Selenium Wire. This allows the remote instance to communicate back to Selenium Wire with its requests and responses.

options = { 'addr': 'hostname_or_ip'. # Machine ip on which docker is hosted } driver = webdriver.Remote( command_executor='http://www.example.com', seleniumwire_options=options )

This is working fine for me. No proxy required.

EX:- desired_cap = {'browserName': 'chrome'} swoptions = { 'addr': '172.x.x.x' } driver = swdriver.Remote(command_executor='http://172.x.x.x:4444/wd/hub', desired_capabilities=desired_cap, seleniumwire_options=swoptions)

illustratumq commented 1 year ago

Hello there. Looks like I found the problem.

The proxies will work if you use the Remote method to launch the browser, and set auto_config=True in seleniumwire_options

gecoool commented 1 year ago

Hey illustratumq Would you be so kind to share some more details on what exactly did you do ? Thx !

danztensai commented 11 months ago

@illustratumq can you share your code here? what did you do to solve the problem?