wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 254 forks source link

Does selenium-wire work with https proxies with auth? #252

Open rileymd88 opened 3 years ago

rileymd88 commented 3 years ago

Hi there,

I have been using a set of HTTP proxies in the following way without any issues:

from seleniumwire import webdriver

wire_options = {
                'proxy': {'https':f"https://{rand_proxy['username']}:{rand_proxy['password']}@{rand_proxy['ip']}:{rand_proxy['port']}"}
}
driver = webdriver.Chrome(desired_capabilities=caps,executable_path='./chromedriver.exe', chrome_options=chrome_options, seleniumwire_options=wire_options)
driver.get(link)

If I try to use another proxy which is using HTTPS, then it does not work and my driver is not able to open any links (no connection). Are https proxies supported or do I need to change the wire_options?

Thanks

wkeeling commented 3 years ago

Thanks for this. Yes both HTTP and HTTPS proxies are supported, and the config you've posted above looks like it is formatted correctly. Are you seeing any error message in the script output when you run with the proxy that doesn't seem to work?

voxvici commented 3 years ago

They work and it works with a rotating proxy which is a blessing

rileymd88 commented 3 years ago

Thank you both for looking at my issue. I put together a sample which shows how it is working with the requests library but not selenium-wire. Here is my sample code:

import requests
from seleniumwire import webdriver as sw_webdriver
from requests.auth import HTTPProxyAuth

proxies = open("proxies","r+")
proxies = proxies.read().split('\n')
random_int = random.randint(1,len(proxies) -1)
rand_proxy = ast.literal_eval(proxies[random_int])

s = requests.Session()
s.trust_env=False

proxies = {
  "http": f"http://{rand_proxy['username']}:{rand_proxy['password']}@{rand_proxy['ip']}:{rand_proxy['port']}",
  "https": f"https://{rand_proxy['username']}:{rand_proxy['password']}@{rand_proxy['ip']}:{rand_proxy['port']}"
}

auth = HTTPProxyAuth(rand_proxy['username'], rand_proxy['password'])
s.proxies = proxies
s.auth = auth

my_ip = s.get('https://icanhazip.com')
print(my_ip.text)

wire_options = {
    'proxy': {'https':f"https://{rand_proxy['username']}:{rand_proxy['password']}@{rand_proxy['ip']}:{rand_proxy['port']}"}
}
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
driver = sw_webdriver.Chrome(executable_path='./chromedriver.exe', chrome_options=chrome_options, seleniumwire_options=wire_options)
try:
    driver.get("https://icanhazip.com")
except Exception as e:
    print(e)

The exception I got was the following:

127.0.0.1:54898: request
  -> HTTP protocol error in client request: Server disconnected
Message: unknown error: net::ERR_TUNNEL_CONNECTION_FAILED
  (Session info: headless chrome=89.0.4389.90)
wkeeling commented 3 years ago

I can't see anything wrong with the above code. Are you able to share the details of a proxy that gives the error? I can try and reproduce my side and debug the issue locally.

eqMFqfFd commented 3 years ago

I can't see anything wrong with the above code. Are you able to share the details of a proxy that gives the error? I can try and reproduce my side and debug the issue locally.

Same issue here. Do you know at least the causes that attribute to such an issue?

wkeeling commented 3 years ago

It seems that there is a bug with proxy auth that was recently introduced - see https://github.com/wkeeling/selenium-wire/issues/264. That bug causes proxy auth to fail when the disable_capture option is set to True. However, it doesn't look as though the above code uses disable_capture so this may not be the cause of this issue.

eqMFqfFd commented 3 years ago

It seems that there is a bug with proxy auth that was recently introduced - see #264. That bug causes proxy auth to fail when the disable_capture option is set to True. However, it doesn't look as though the above code uses disable_capture so this may not be the cause of this issue.

It does not happen solely with this flag. Using SSL proxy, and seleniumwire_options.disable_capture set to False, the error persists.

wkeeling commented 3 years ago

Does the upstream HTTPS proxy that causes the problem use a self-signed certificate?

wkeeling commented 3 years ago

Something else worth trying here is to add the following option to your seleniumwire_options:

'mitm_http2': False

I've noticed that certain websites don't seem to load when both HTTP2 and an upstream HTTPS proxy is in use.

eqMFqfFd commented 3 years ago

I have no proxies to test with atm. I will update in a couple of weeks. Please reopen.

valerino commented 2 years ago

any news ? stumbled into this today, and i get the same exact issue ....

colbyhill21 commented 2 years ago

Is there any update on this issue? I'm hitting this same exact problem right now.

vivri commented 2 years ago

+1 seeing the same issue now with the latest version (4.6.0)

valerino commented 2 years ago

i mentioned it again months ago in #441 with detailed code, but no answer still ....

kinoute commented 2 years ago

This is really a nasty bug, not being able to rotate proxies on each get is really annoying.

Chetan11-dev commented 9 months ago

Botasaurus Framework supports SSL with authenticated proxy sych as http://username:password@proxy-provider-domain:port.

seleniumwire-vs-botasaurus

Installation

pip install botasaurus

Example

from botasaurus import *

@browser(proxy="http://username:password@proxy-provider-domain:port") # TODO: Replace with your own proxy 
def visit_ipinfo(driver: AntiDetectDriver, data):
    driver.get("https://ipinfo.io/")
    driver.prompt()

visit_ipinfo()

You can learn about Botasaurus Here.