seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
4.46k stars 910 forks source link

Request headers not changed if UC mode and Cloudflare #2717

Closed ivanovevgeny closed 2 months ago

ivanovevgeny commented 2 months ago

Hi!

I want to use different proxy on each request. So i developed plugin for MITM proxy, which take a custom header "X-Upstream-Proxy": "http://user:pass@ip:port".

If i open simple site without cloudflare, my custom header is used, otherwise - request goes to my proxy but without custom header.

Code:

from seleniumbase import Driver

driver = Driver(
    uc=True, 
    headless=True, 
    proxy="http://127.0.0.1:8080",
    multi_proxy=True, 
    binary_location="c:\cloudparser\Chrome\chrome.exe",
    uc_cdp_events=True)
try:
    driver.execute_cdp_cmd("Network.enable", {})
    headers = {'headers': {"X-Upstream-Proxy": "http://user:pass@194.28.210.32:9940"}}
    driver.execute_cdp_cmd('Network.setExtraHTTPHeaders', headers)

    driver.sleep(3)
    driver.get('https://myip.ru/')
    driver.sleep(3)
    driver.save_screenshot('screen.png')
finally:
    driver.quit()

With driver.get('https://myip.ru/') I got the proxy ip

sc1

But for driver.get('https://radar.cloudflare.com/ip') the result is my local ip

sc2

And if i debug my mitm proxy plugin, i did't see my custom header.

Any idea to resolve it?

mdmintz commented 2 months ago

It looks like MITM proxies aren't compatible with UC Mode because MITM's TLS fingerprint doesn't match typical web browsers, which would lead to getting detected: https://github.com/mitmproxy/mitmproxy/issues/4575

You could try using driver.uc_open(url) to open a URL in UC Mode in the same tab, because otherwise driver.get(url) defaults to using driver.uc_open_with_reconnect(url), which opens the URL in a new tab, and will undo any execute_cdp_cmd() changes. Also note that disconnecting the driver from Chrome might also undo those settings, which is what UC Mode does to remain undetected during page loads and clicks.

Might be easier to just use a regular proxy via the proxy arg.

ivanovevgeny commented 2 months ago

Thanks for the clarification and such a quick asnwer!

Might be easier to just use a regular proxy via the proxy arg. But if I want to change the proxy for every request, I'll have to recreate a Driver every time, which I suppose is a very resource-intensive operation.

Do you think it’s worth using a solution like https://github.com/refraction-networking/utls that replaces the fingerprint, or is it better to write an extension for Chrome that can set a proxy for each request?

Maybe there is a better solution for "untedected + proxy per request" task?

mdmintz commented 2 months ago

I'll let you figure out the best approach. Don't forget to Star SeleniumBase on GitHub. ⭐