seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
4.45k stars 908 forks source link

Major updates have arrived in `4.28.0` (mostly for UC Mode) #2865

Open mdmintz opened 1 week ago

mdmintz commented 1 week ago

For anyone that hasn't been following https://github.com/seleniumbase/SeleniumBase/issues/2842, CF pushed an update that prevented UC Mode from easily bypassing CAPTCHA Turnstiles on Linux servers. Additionally, uc_click() was rendered ineffective for clicking Turnstile CAPTCHA checkboxes when clicking the checkbox was required. I've been working on solutions to these situations.

As I mentioned earlier in https://github.com/seleniumbase/SeleniumBase/issues/2842#issuecomment-2176310108, if CF detects either Selenium in the browser or JavaScript involvement in clicking the CAPTCHA, then they don't let the click through. (The JS-detection part is new.) I read online that CF employees borrowed ideas from https://github.com/kaliiiiiiiiii/brotector (a Selenium detector) in order to improve their CAPTCHA. Naturally, I was skeptical at first, but I have confirmed that the two algorithms do appear to get similar results. (Brotector was released 6 weeks ago, while the Cloudflare update happened 2 weeks ago.)

The solution to bypassing the improved CAPTCHAs requires using pyautogui to stay undetected. There was also the matter of how to make pyautogui work well on headless Linux servers. (Thanks to some ideas by @EnmeiRyuuDev in https://github.com/seleniumbase/SeleniumBase/issues/2842#issuecomment-2168829685, that problem was overcome by setting pyautogui._pyautogui_x11._display to Xlib.display.Display(os.environ['DISPLAY']) on Linux in order to sync up pyautogui with the X11 virtual display.)

The improved SeleniumBase UC Mode will have these new methods:

driver.uc_gui_press_key(key)  # Use PyAutoGUI to press the keyboard key

driver.uc_gui_press_keys(keys)  # Use PyAutoGUI to press a list of keys

driver.uc_gui_write(text)  # Similar to uc_gui_press_keys(), but faster

driver.uc_gui_handle_cf(frame="iframe")  # PyAutoGUI click CF Turnstile

It'll probably be easier to understand how those work via examples. Here's one for uc_gui_handle_cf based on the example in https://github.com/seleniumbase/SeleniumBase/issues/2842#issuecomment-2159004018:

import sys
from seleniumbase import SB

agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/126.0.0.0"
if "linux" in sys.platform:
    agent = None  # Use the default UserAgent

with SB(uc=True, test=True, rtf=True, agent=agent) as sb:
    url = "https://www.virtualmanager.com/en/login"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_handle_cf()  # Ready if needed!
    sb.assert_element('input[name*="email"]')
    sb.assert_element('input[name*="login"]')
    sb.set_messenger_theme(location="bottom_center")
    sb.post_message("SeleniumBase wasn't detected!")

Above, I deliberately gave it an incomplete UserAgent so that CAPTCHA-clicking is required to advance. On macOS and Windows, the default UserAgent that SeleniumBase gives you is already enough to bypass the CAPTCHA screen entirely. The uc_gui_handle_cf() method is designed such that if there's no CAPTCHA that needs to be clicked on the page you're on, then nothing happens. Therefore, you can add the line whenever you think you'll encounter a CAPTCHA or not. In case there's more than one iframe on a website, you can specify the CSS Selector of the iframe as an arg when calling uc_gui_handle_cf(). There will be new examples in the SeleniumBase/examples/ folder for all the new UC Mode methods. To sum up, you may need to use the newer uc_gui_* methods in order to get past some CAPTCHAs on Linux where uc_click() worked previously.

On the topic of Brotector, (which is the open source bot-detector library that CF borrowed ideas from), there is a huge opportunity: Now that effective bot-detection software is available to the general public (all the code is open source!), anyone can now build their own CAPTCHA services (or just add CAPTCHAs to sites without the "service" part). I've already jumped on this with the Brotector CAPTCHA: https://seleniumbase.io/apps/brotector. I've also created a few test sites that utilize it:

I did make some improvements to the original Brotector algorithm in order to be suitable for CAPTCHAs: I needed a definite Allow/Block answer, rather than a number between 0 and 1 determining the likelihood of a bot, etc. I've been using these new test sites for testing the improved UC Mode.

That covers the major updates from 4.28.0 (with the exception of Brotector CAPTCHA test sites, which were already available to the public at the URLs listed above).

There will also be some other improvements:

Now, when using UC Mode on Linux, the default setting is NOT using headless mode. If for some reason you decide to use UC Mode and Headless Mode together, note that although Chrome will launch, you'll definitely be detected by anti-bots, and on top of that, pyautogui methods won't work. Use xvfb=True / --xvfb in order to be sure that the improved X11 virtual display on Linux activates. You'll need that for the uc_gui_* methods to work properly.

Much of that will get covered in the 3rd UC Mode video tutorial on YouTube (expected sometime in the near future).

In case anyone has forgotten, SeleniumBase is still a Test Automation Framework at heart, (which includes an extremely popular feature for stealth called "UC Mode"). UC Mode has gathered a lot of the attention, but SeleniumBase is more than just that.

mdmintz commented 1 week ago

4.28.0 has been released: https://github.com/seleniumbase/SeleniumBase/releases/tag/v4.28.0

The pyautogui example for a Cloudflare page with UC Mode:

Examples of bypassing the Brotector CAPTCHA with UC Mode:

Examples of how the Brotector CAPTCHA detects regular Selenium:

mdmintz commented 1 week ago

Here's an example script for Linux to prove it's working:

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://www.virtualmanager.com/en/login"
    sb.uc_open_with_reconnect(url, 4)
    print(sb.get_page_title())
    sb.uc_gui_handle_cf()  # Ready if needed!
    print(sb.get_page_title())
    sb.assert_element('input[name*="email"]')
    sb.assert_element('input[name*="login"]')
    sb.set_messenger_theme(location="bottom_center")
    sb.post_message("SeleniumBase wasn't detected!")
Screenshot 2024-06-23 at 8 18 24 PM

The second print() should show "Virtual Manager", which means that the automation was able to get past the Turnstile.

vmolostvov commented 1 week ago

@mdmintz same problem here on linux vds (ubuntu without gpu), seleniumbase became unable to bypass the CloudFlare challenge. Using latest sb version. On local macos and windows keep working without any problem.

I can confirm that my issue on headless Linux Ubuntu was solved by 4.28.0

Снимок экрана 2567-06-24 в 11 50 54

Appreciate your work sir @mdmintz

SSujitX commented 1 week ago

The issue has been resolved after restarting my PC, but I didn't understand why this error happened.

It seems that after the latest update, the script has not opened any websites. This is the first time this issue has happened to me. The driver opens successfully but can not access the provided URL.

I update Pypi and Seleniumbase. Even created fresh virtualenv nothing happened. Is that a Chrome this ip:port 127.0.0.1:9222 issue?

image

error:

=============================================== {Login_Test_all.py:3:SB} starts =============================================== Traceback (most recent call last): File "c:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\Login_Test_all.py", line 3, in <module> with SB(uc=True, test=True, rtf=True) as sb: File "C:\Users\ssuji\AppData\Local\Programs\Python\Python312\Lib\contextlib.py", line 137, in __enter__ return next(self.gen) ^^^^^^^^^^^^^^ File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\seleniumbase\plugins\sb_manager.py", line 949, in SB sb.setUp() File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\seleniumbase\fixtures\base_case.py", line 14838, in setUp self.driver = self.get_new_driver( ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\seleniumbase\fixtures\base_case.py", line 4037, in get_new_driver new_driver = browser_launcher.get_driver( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\seleniumbase\core\browser_launcher.py", line 1841, in get_driver return get_local_driver( ^^^^^^^^^^^^^^^^^ File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\seleniumbase\core\browser_launcher.py", line 3784, in get_local_driver driver = undetected.Chrome( ^^^^^^^^^^^^^^^^^^ File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\seleniumbase\undetected\__init__.py", line 312, in __init__ super().__init__(options=options, service=service_) File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 45, in __init__ super().__init__( File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 66, in __init__ super().__init__(command_executor=executor, options=options) File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 212, in __init__ self.start_session(capabilities) File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\seleniumbase\undetected\__init__.py", line 475, in start_session super().start_session(capabilities) File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 299, in start_session response = self.execute(Command.NEW_SESSION, caps)["value"] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 354, in execute self.error_handler.check_response(response) File "C:\Users\ssuji\OneDrive\Desktop\All Codes\Python Development\VFX Tool Telegram 6.0\.venv\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 229, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.SessionNotCreatedException: Message: session not created: cannot connect to chrome at 127.0.0.1:9222 from chrome not reachable

goldananas commented 6 days ago

Hey @mdmintz , do you think you will be working on making the uc_gui_handle_cf method compatible with the returnable Driver ? Works just fine on headless Linux with SB, but I can't find way to do the same with the returnable Driver instead.

mdmintz commented 6 days ago

@goldananas If using the Driver() format instead of SB(), you'll need need to spin up the special X11 virtual display yourself before launching the driver. (See https://github.com/seleniumbase/SeleniumBase/issues/2842#issuecomment-2168392303.)

With the SB() format, SB(uc=True, xvfb=True) does all that for you when running on Linux.

NCLnclNCL commented 2 days ago

I think they detect when we switch to the tickbox frame

mdmintz commented 1 day ago

Windows users should upgrade to 4.28.3 or newer (Fixes https://github.com/seleniumbase/SeleniumBase/issues/2889 on 4.28.2)

JimKarvo commented 23 hours ago

Seems that the CF detected the new way of bypassing.

Sometimes the click works (not always) image

but after that, the checkbox is failed image

mdmintz commented 22 hours ago

macOS: ✅ Windows: ✅ Linux with natural GUI on residential IP: ✅ Linux without GUI on non-residential IP: ❌ Linux without GUI on residential IP: ⚠️ / ❓

So much for the free pass on GitHub Actions CAPTCHA bypassing. 😄 I didn't expect that loophole to last long.

JimKarvo commented 22 hours ago

@mdmintz I forgot to mention that I am running an Ubuntu server, no GUI

mdmintz commented 21 hours ago

@JimKarvo Residential IP or non-residential?

OpsecGuy commented 20 hours ago

@mdmintz In my case I also have some issues with the bypass. We talk about the 4.28.3 version of the seleniumbase. On Windows there are no issues, however on my Linux (Ubuntu 20) VM with GUI #https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_pyautogui.py In that script, I just edit the URL to the website that at first connect shows Cloudflare CF captcha. The same IP that successfully bypasses the captcha on Windows doesn't want to work on Linux with GUI. On the bare-metal server where I have Ubuntu 22 installed, I'm also stuck on the CF captcha page and experimenting with the reconnect timeout doesn't solve my issues.

My user agent on both Linux machines is: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 Google Chrome version is 126.0.6478.126 I tested with my residential IP and the other residential proxies.

// Tested original code from raw_pyautogui.py and looks like it worked, but on any other website I test I get this: alt text

mdmintz commented 19 hours ago

The last successful GitHub Actions run for bypassing Cloudflare's Turnstile was https://github.com/mdmintz/undetected-testing/actions/runs/9748457978/job/26903480495 8 hours ago. Likely their QA Team did not initially catch that their Turnstiles were getting bypassed on GitHub Actions until they came over to the SeleniumBase repo and read the notes.

Screenshot 2024-07-01 at 10 03 55 PM
gabrielsim commented 19 hours ago

Linux without GUI on residential IP: ⚠️ / ❓

@mdmintz fyi, Linux without GUI on residential IPs still works for me

mdmintz commented 17 hours ago

@gabrielsim That's good news: That means the algorithm works right now when the IP Address hasn't been blocked already. When it worked earlier on GitHub Actions, it was due to a bug on Cloudflare's end when then forget to check IP ranges for known non-residential server addresses. They finally fixed it: Likely after reading this thread and learning about the loophole.

No changes are needed for UC Mode at this time. However, Brotector still has some bot-checks that Cloudflare hasn't picked up yet. This would allow them to detect switching into an iframe, as well the JavaScript for making an element the active one. There's already a plan in place for that scenario, involving pyautogui for more things, and not just clicking the active element.

OpsecGuy commented 12 hours ago

@mdmintz In my case I also have some issues with the bypass. We talk about the 4.28.3 version of the seleniumbase. On Windows there are no issues, however on my Linux (Ubuntu 20) VM with GUI #https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_pyautogui.py In that script, I just edit the URL to the website that at first connect shows Cloudflare CF captcha. The same IP that successfully bypasses the captcha on Windows doesn't want to work on Linux with GUI. On the bare-metal server where I have Ubuntu 22 installed, I'm also stuck on the CF captcha page and experimenting with the reconnect timeout doesn't solve my issues.

My user agent on both Linux machines is: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 Google Chrome version is 126.0.6478.126 I tested with my residential IP and the other residential proxies.

// Tested original code from raw_pyautogui.py and looks like it worked, but on any other website I test I get this: alt text

That's very strange, however any website that I test now, bypass seems to be working properly... Idk what CF team is doing, but I believe the prepare another big update for us.

enricodvn commented 7 hours ago

Hey guys, just to add to this discussion, CF is now detecting residential proxies with ML:

https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning

The residential proxies I used became pretty much useless since mid June :/

amberbor commented 4 hours ago

Hey guys

can somebody help me i am trying on macos to get the dexscreener and to bypass cloudflare but it doesnt work

import time

from seleniumbase import SB

with SB(uc=True, xvfb=True) as sb: url = "https://dexscreener.com" sb.uc_open_with_reconnect(url, 4) print(sb.get_page_title()) sb.uc_gui_handle_cf() # Ready if needed! print(sb.get_page_title())

time.sleep(70)
JimKarvo commented 4 hours ago

Updates: Nothing to do with blocked IP or proxies.

I have scripts running on Windows machines (headed) on my home IP. I forward all traffic from ubuntu server through my home IP. The first 3-5 pages are ok. After that pages CF appears to my browser. SeleniumBase still can't bypass it meanwhile at my windows pc, I can bypass with no problems.

amberbor commented 2 hours ago

https://github.com/seleniumbase/SeleniumBase/assets/47393618/4b6f266d-bcb9-4083-a68a-f4b5b8d3346d

mdmintz commented 2 hours ago

@amberbor The best user-agent to use is the default one that SeleniumBase sets for you automatically.

amberbor commented 1 hour ago

@amberbor The best user-agent to use is the default one that SeleniumBase sets for you automatically.

@mdmintz Thanks for you reply . First time that i ran the code was without user agent , but the problem is that in chrome it doesnt show the checkbox of cloudflare when i run this code . I provided the video so you can see that it doesnt show the checkbox , it loads all the time .

another example is this , and i have the same output as the one that i send with video

from seleniumbase import SB

with SB(uc=True, incognito=True) as sb: url = "https://dexscreener.com" sb.uc_open_with_reconnect(url, 10) print(sb.get_page_title()) sb.uc_gui_handle_cf() print(sb.get_page_title())

digicodexx commented 1 hour ago

@amberbor, add sb.sleep(5) before the sb.uc_gui_handle_cf() line so it doesn't click the checkbox instantly.

amberbor commented 1 hour ago

@amberbor, add sb.sleep(5) before the sb.uc_gui_handle_cf() line so it doesn't check the checkbox instantly.

@digicodexx still the same issue , even if i open an incongito mode it doesnt show the checkbox . Here is the example with sb.sleep(5) even if i set sb.sleep(10) still it doesnt show the checkbox

https://github.com/seleniumbase/SeleniumBase/assets/47393618/551a027c-2fb9-4bc0-87c1-93504c80d2be