seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
4.46k stars 909 forks source link

CloudFlare verification not working under VPN #2793

Closed bjornkarlsson closed 1 month ago

bjornkarlsson commented 1 month ago

The following code works on a direct connection (no verification asked), however when using VPN (or an http proxy, NordVPN in my case) clicking on the verification box doesn't let the verification go through:


    URL = 'https://rateyourmusic.com/artist/pink-floyd/'
    with SB(uc=True) as sb:
        sb.driver.uc_open(URL)
        if sb.is_element_visible('iframe[src*="challenge"]'):
            iframe = sb.find_element('iframe[src*="challenge"]')
            sb.driver.switch_to.frame(iframe)
            confirm_input = sb.driver.find_element(By.CSS_SELECTOR, 'input')
            confirm_input.click()
            sb.sleep(2)

I have tried other libraries with undected capabilities through VPN, and while most didn't go through I was able to go past the verification box with https://github.com/kaliiiiiiiiii/Selenium-Driverless.

Anyone else experienced a similar behaviour?

mdmintz commented 1 month ago

I used the script below and didn't encounter any CAPTCHAs:

from seleniumbase import SB

with SB(uc=True, ad_block_on=True) as sb:
    url = "https://rateyourmusic.com/artist/pink-floyd/"
    sb.driver.uc_open_with_reconnect(url, 8)

Also, whenever you need to click in UC Mode, use sb.driver.uc_click(selector) to remain undetected. (That'll work unless they already detected you in a previous step.)

bjornkarlsson commented 1 month ago

Unfortunately, this does not solve the issue, the problem occurs only when using a VPN connection or an https Proxy. So it needs to be tested under a VPN (my provider is NordVPN) otherwise there is no issue on a direct connection.

On a VPN:

The verification box excepts a click from the user. After the checkbox input is being clicked from the driver it attempts a verification for a few seconds (spinning wheel), however it ends up on the input box being unchecked expecting for a retry of this process

mdmintz commented 1 month ago

NordVPN isn't supported. That can prevent regular browsers from failing to bypass CAPTCHAs too.

Even the original undetected-chromedriver had that issue. (See https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/1441#issuecomment-1671905938 and https://github.com/ultrafunkamsterdam/undetected-chromedriver/discussions/1387#discussioncomment-6552952)

bjornkarlsson commented 1 month ago

Thanks for looking into this! Only wanted to see if there was a change with seleniumbase as otherwise I would need to rewrite everything in an async mode :D by using selenium-driverless which seems to work

mdmintz commented 1 month ago

Possibly it was related to the part where the VPN hides your IP:

Screenshot 2024-05-21 at 3 54 22 PM

--

If the VPN replaces your IP with another IP that was already flagged as a bot, then a site may flag you as a bot.

bjornkarlsson commented 1 month ago

The things that confuses me is that gitlab login verification passes under VPN after clicking captcha using seleniumbase, nowsecure.nl does not even ask for captcha.

Not sure wether Cloudflare bot detection can be configured per site or there are different levels of detection.

mdmintz commented 1 month ago

I'm not sure about Cloudflare configuration on a site-by-site basis, but I am familiar with how to bypass it. 🙂

bjornkarlsson commented 1 month ago

Are you aware of any resource/documentation that explains how to bypass Cloudflare verification, or better dig in the source code?

As both the regular browser and driverless chrome escape the checks (NordVPN has also HTTPS proxies that could be used as chrome plugin which work fine as well in that site) I feel it's the chrome driver getting caught rather than a global lock on the Vpn ips. Being a stubborn engineer I would probably prefer dig in dirt for a little bit rather than rewrite my bots with async selenium-driverless, so I could give it a chance 😄

Greetings!

mdmintz commented 1 month ago

To figure out why you were detected, run this script: SeleniumBase/examples/raw_pixelscan.py


To learn more about how UC Mode works for bypassing Cloudflare, check out these videos:

(Watch the 1st UC Mode tutorial on YouTube! ▶️)

--

(Watch the 2nd UC Mode tutorial on YouTube! ▶️)

bjornkarlsson commented 1 month ago

watched both vids, very helpful, even have enjoyed the videos ;)