Closed mdmintz closed 2 months ago
For discussion, come join us on Discord: https://discord.com/invite/HDk5wYvzEZ.
Still trying to figure it out. I even tried nodriver, but that didn't bypass the CAPTCHA on Linux either:
import nodriver
import time
from sbvirtualdisplay.display import Display
async def main():
browser = await nodriver.start()
page = await browser.get("https://gitlab.com/users/sign_in")
time.sleep(4)
print(await page.evaluate("document.title"))
await page.save_screenshot("screenshot.png")
if __name__ == "__main__":
disp = Display(
visible=True, size=(1366, 768), backend="xvfb", use_xauth=True
)
disp.start()
nodriver.loop().run_until_complete(main())
disp.stop()
Similar to SeleniumBase, it also bypasses the CAPTCHA on macOS / Windows.
Maybe CF blocked all Linux access? Or they figured out how to do fingerprinting well (and can now determine the difference between a Desktop Linux machine with a GUI versus a GUI-less Linux Server). Will probably sleep on it. Ideas are welcome. At least automation can still bypass CAPTCHAs on macOS / Windows, meaning that web-scraping servers will need to run there now if the situation isn't handled.
Looks like we're just dealing with good old-fashioned IP-Address-blocking. GitHub Actions IP Addresses are now known by CF, and their Turnstiles won't let you through if they spot browsers coming from those IPs (or other known server IPs).
The solution is to change proxy settings to a "safe" IP Address via the proxy
arg. Maybe that means using residential proxies, or "special" server IP Addresses that aren't on some block list.
As it turns out, CF isn't fully blocking on IP Addresses. They're just making you do more work to click the CAPTCHAs.
This worked in GitHub Actions: (Coordinates will be different depending on the site and the environment.)
from seleniumbase import SB
with SB(uc=True, test=True) as sb:
import pyautogui
url = "https://www.virtualmanager.com/en/login"
sb.uc_open_with_disconnect(url)
sb.sleep(6)
pyautogui.moveTo(228, 387, 1.05, pyautogui.easeOutQuad)
sb.sleep(0.056)
pyautogui.click()
sb.sleep(3)
sb.reconnect()
print(sb.get_page_title())
Which means you need to either:
proxy
to change it.)Here's a way to do it without knowing the coordinates in advance:
from seleniumbase import SB
from seleniumbase import config as sb_config
with SB(uc=True, test=True) as sb:
import pyautogui
url = "https://www.virtualmanager.com/en/login"
sb.uc_open_with_reconnect(url, 6)
print(sb.get_page_title())
sb.uc_gui_click_captcha()
print(sb.get_page_title())
if (
"Just a moment" in sb.get_page_title()
and hasattr(sb_config, "_saved_cf_x_y")
):
sb.uc_open_with_disconnect(url)
sb.sleep(4)
pyautogui.click(sb_config._saved_cf_x_y)
sb.sleep(3)
sb.reconnect()
print(sb.get_page_title())
Just swap the URL for the one you need. Eg. https://gitlab.com/users/sign_in
from seleniumbase import SB
from seleniumbase import config as sb_config
with SB(uc=True, test=True) as sb:
import pyautogui
url = "https://gitlab.com/users/sign_in"
sb.uc_open_with_reconnect(url, 6)
print(sb.get_page_title())
sb.uc_gui_click_captcha()
print(sb.get_page_title())
if (
"Just a moment" in sb.get_page_title()
and hasattr(sb_config, "_saved_cf_x_y")
):
sb.uc_open_with_disconnect(url)
sb.sleep(4)
pyautogui.click(sb_config._saved_cf_x_y)
sb.sleep(3)
sb.reconnect()
print(sb.get_page_title())
Limitations: Multithreaded scripts where more than one window is automated at the same time.
Otherwise, Linux users can use this on the current version of SeleniumBase. (Quite possibly, this will need to be used soon on macOS and Windows too.)
This script is all you need to bypass CF on GitHub Actions:
with SB(uc=True, test=True) as sb:
url = "https://gitlab.com/users/sign_in"
sb.uc_open_with_reconnect(url, 4)
sb.uc_gui_click_captcha()
print(sb.get_page_title())
Swap the URL above for the one you need.
Also, SeleniumBase 4.30.4
is here.
(You'll see some Linux improvements.)
The CF CAPTCHAs changed again (on Linux)
CI started failing:
This is how it normally looks when passing:
(
PyAutoGUI
clicks the CAPTCHA successfully, and then takes you to the real page.)I'm looking into what changed. Changes come frequently, as you may have seen in UC Mode Video 3: https://www.youtube.com/watch?v=-EpZlhGWo9k, where I talked about "The Great CAPTCHA Duel".
If you figure out what changed before I do, let me know.