seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
5.35k stars 980 forks source link

UC Mode users should upgrade to 4.30.8 (or newer if available) #3130

Closed mdmintz closed 1 week ago

mdmintz commented 1 month ago

UC Mode users should upgrade to 4.30.8 (or newer if available)

In case you missed https://github.com/seleniumbase/SeleniumBase/issues/3128 and https://github.com/seleniumbase/SeleniumBase/issues/3131, CF made changes to their CAPTCHAs, and new UC Mode updates were needed to continue clicking them successfully (when they force you to click the CAPTCHA to bypass it).

This includes fixes for the following methods:

These CAPTCHA changes shouldn't be surprising to anyone who's already watched: https://www.youtube.com/watch?v=-EpZlhGWo9k (my latest UC Mode video tutorial: "Revenge of the CAPTCHAs").

Upgrade to SeleniumBase 4.30.8 (or newer if available) to continue bypassing CAPTCHAs as usual.

Selenium is used to calculate the location of the CAPTCHA (or the number of times to press Tab before pressing Spacebar) and then the page needs to reload before performing those actions again because Selenium gets detected when calculating locations / tab presses. After the page refreshes, PyAutoGUI saves the day by performing actions in a stealthy way.

ProtocolNebula commented 1 month ago

Stopped working around 3 hours ago, upgraded, still not working :(

kevtruong170 commented 1 month ago

Stopped working around 3 hours ago, upgraded, still not working :(

Version 4.30.1 would cause the cursor to click at the top left corner of the web browser. I would double check that you upgraded or other components because it is working fine for me.

ProtocolNebula commented 1 month ago

Stopped working around 3 hours ago, upgraded, still not working :(

Version 4.30.1 would cause the cursor to click at the top left corner of the web browser. I would double check that you upgraded or other components because it is working fine for me.

.venv/bin/pip freeze | grep "base"

seleniumbase==4.30.7

I will check because maybe is trying to solve the dep from another folder...

Any tips? I'm still trying to adapt to python

kevtruong170 commented 1 month ago

It's hard to say without actually seeing the code. My best suggestion is to test 4.30.7 in a separate environment to ensure its working properly and then slowly break down the code until you notice a change in behaviour.

ProtocolNebula commented 1 month ago

As per I can see in headed mode, it's solving the cloudflare challenge but it's recognized as bot (the screen stay as white, which means bot detected).

Reconnect time test with 7, 10 and 15, re-installed deps, even installed from github, still not working.

I will do more tests in some minutes, but I'm run out of ideas.

vmolostvov commented 1 month ago

Seems like sb.uc_gui_click_captcha() working correctly but sb.uc_gui_handle_cf() keeps crashing, can you check please @mdmintz

ProtocolNebula commented 1 month ago

As additional information, that's my code to solve cloudflare:

            sb.uc_click('[type="submit"]', reconnect_time=7)

            try:
                # with lockCloudflare:
                sb.post_message("Solving cloudflare...")
                sb.uc_gui_click_captcha()  # Only if needed
            except Exception as e:
                logger.error("ERROR: Cloudflare challenge error")
                votesSummary["totalCloudflareError"] += 1
                raise e
mdmintz commented 1 month ago

Seems like sb.uc_gui_click_captcha() working correctly but sb.uc_gui_handle_cf() keeps crashing, can you check please @mdmintz

Looks like it. In the meantime, people should use sb.uc_gui_click_captcha() if they can, which works in all environments.

vmolostvov commented 1 month ago

Seems like sb.uc_gui_click_captcha() working correctly but sb.uc_gui_handle_cf() keeps crashing, can you check please @mdmintz

Looks like it. In the meantime, people should use sb.uc_gui_click_captcha() if they can, which works in all environments.

yep but I can't use it on my linux ubuntu remote server with chromium_arg='--force-device-scale-factor=2.0', using this argument for high quality screenshots and when I use it with sb.uc_gui_click_captcha() it can't bypass captcha. If I disable this arg it can bypass through sb.uc_gui_click_captcha() so I have to use sb.uc_gui_handle_cf()

ProtocolNebula commented 1 month ago

As per the test i've did (with headed mode), the cloudflare is detecting the selenium attached, because it changes to white the screen before the cursor starts moving.

@mdmintz

mdmintz commented 1 month ago

Updated title: "UC Mode users should upgrade to 4.30.8 (or newer if available)"

ProtocolNebula commented 1 month ago

As per the test i've did (with headed mode), the cloudflare is detecting the selenium attached, because it changes to white the screen before the cursor starts moving.

Upgraded and same error from here.

Tested with both uc_gui_handle_captcha and uc_gui_click_captcha.

Any clues @mdmintz?

mdmintz commented 1 month ago

These scripts are all working for me:

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://gitlab.com/users/sign_in"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()  # Only if needed
    sb.assert_element('label[for="user_login"]')
    sb.assert_element('input[data-testid*="username"]')
    sb.assert_element('input[data-testid*="password"]')
    sb.set_messenger_theme(location="bottom_center")
    sb.post_message("SeleniumBase wasn't detected!")

And:

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://gitlab.com/users/sign_in"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_handle_captcha()  # Only if needed
    sb.assert_element('label[for="user_login"]')
    sb.assert_element('input[data-testid*="username"]')
    sb.assert_element('input[data-testid*="password"]')
    sb.set_messenger_theme(location="bottom_center")
    sb.post_message("SeleniumBase wasn't detected!")

And:

from seleniumbase import SB

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "https://www.cloudflare.com/login"
    sb.uc_open_with_reconnect(url, 5.5)
    sb.uc_gui_handle_captcha()  # PyAutoGUI press Tab and Spacebar
    sb.sleep(2.5)

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "https://www.cloudflare.com/login"
    sb.uc_open_with_reconnect(url, 5.5)
    sb.uc_gui_click_captcha()  # PyAutoGUI click. (Linux needs it)
    sb.sleep(2.5)

If they aren't working for you right now, then your IP Address or proxy may already be blocked.

And note that on Linux, you may need to use the uc_gui_click_captcha() version.

And if using Windows, you may need to set the scaling factor to 100%. (On higher scaling factors, there may be problems calculating coordinates, so just set scaling to 100% to get around that.)

ProtocolNebula commented 1 month ago

I'm on linux, tried with and without proxy.

In any case, If I do manually the click, everything works fine, this only fails when selenium attachs to the browser, so the issue is not related to the IP.

Video attached (I insist, if I do the click manually, it works fine): https://github.com/user-attachments/assets/e0f0daf1-35cc-4214-b1d7-9c0afb9a0253

EDIT: In the video is not clear, but this happens when cloudflare tab is re-open (when it's attached to selenium)

mdmintz commented 1 month ago

Things are working for me right now in GitHub Actions:

https://github.com/mdmintz/undetected-testing/actions/runs/10838173166/job/30075795632

Screenshot 2024-09-12 at 7 06 04 PM

* Here's the script that was run:

from seleniumbase import SB

with SB(uc=True, test=True, rtf=True) as sb:
    url = "https://seleniumbase.io/hobbit/login"
    sb.uc_open_with_disconnect(url, 3)
    sb.uc_gui_press_keys("\t ")
    sb.reconnect(3)
    print(sb.get_current_url())
    sb.assert_text("Welcome to Middle Earth!", "h1")
    sb.set_messenger_theme(location="bottom_center")
    sb.post_message("SeleniumBase wasn't detected!")

with SB(uc=True, test=True) as sb:
    url = "https://www.virtualmanager.com/en/login"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()
    print(sb.get_page_title())
    sb.assert_element('input[name*="email"]')
    sb.assert_element('input[name*="login"]')
    sb.set_messenger_theme(location="bottom_center")
    sb.post_message("SeleniumBase wasn't detected!")

The second test in the file clicks a CAPTCHA. (The CAPTCHA page only appears on Linux.)

Right before I shipped the fixes, it was failing like this:

Screenshot 2024-09-12 at 7 10 48 PM

But on 4.30.8 things are working correctly.

ProtocolNebula commented 1 month ago

After a lot of re-tries (even I tried in Ubuntu instead of Debian, not on Windows because it was failing python), I notice that is related to the website.

I notice time ago, that if you wait too long, the website turns white, but the seleniumbase was working fine until today.

Now, with the following code, it just lose the content:

from seleniumbase import SB
with SB(uc=True, test=True, locale_code="en") as sb:
  url = "https://www.xtremetop100.com/in.php?site=1132357179"
  sb.uc_open_with_reconnect(url, 5.5)
  sb.type('input[name="captcha_code"]', "dummy")
  sb.uc_click('[type="submit"]', 5.5)
  sb.uc_gui_click_captcha()  # PyAutoGUI click. (Linux needs it)
  sb.sleep(9)

There is any trick to avoid the "refresh"? (I notice that it's not happening in the other examples, so maybe is a CloudFlare protection?)

mdmintz commented 1 month ago

@ProtocolNebula It's because you can't go directly to https://www.xtremetop100.com/in-post.php?site=1132357179 in the browser because it's tied to the previous form action.

And because there's another CAPTCHA on the previous page (not a CF one) it won't let you through if that wasn't solved correctly.

Otherwise, it would be simple to get through with something like:

from seleniumbase import SB
from seleniumbase import config as sb_config

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "https://www.xtremetop100.com/in.php?site=1132357179"
    sb.uc_open_with_reconnect(url, 3)
    sb.type('input[name="captcha_code"]', "dummy")
    sb.uc_click('[type="submit"]', reconnect_time="disconnect")
    sb.sleep(3)
    sb.uc_gui_press_keys("\t ")
    sb.reconnect(6)

UC Mode works for the straightforward open a url that is already a CF Turnstile page. It won't solve your word puzzle.

Simple example of normal UC Mode usage:

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://gitlab.com/users/sign_in"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()  # Only if needed
    sb.assert_element('label[for="user_login"]')
    sb.assert_element('input[data-testid*="username"]')
    sb.assert_element('input[data-testid*="password"]')
    sb.set_messenger_theme(location="bottom_center")
    sb.post_message("SeleniumBase wasn't detected!")
ProtocolNebula commented 1 month ago

@mdmintz , first of all, thanks for the support.

I've been using my bot for 3 weeks, it worked like a charm until today.

The uc_gui_press_keys partially works, it tries to solve the challenge but finally show the blank page, maybe is too obvious for cloudflare to use the keyboard (I have to run this on linux, maybe this just works on windows).

    sb.uc_gui_press_keys("\t ")

I run out of ideas, as a patch, I'm trying to implement an external service to solve these captchas, but the idea is still to use this tool (the tools are too expensive).

PS: I solve the captcha after ensuring cloudflare is skipped, so the dummy text I've put in the example is the same I'm using.

mdmintz commented 1 month ago

@ProtocolNebula Note that you can print and or reuse coordinates that you receive for later. Eg:

from seleniumbase import SB
from seleniumbase import config as sb_config

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "https://www.xtremetop100.com/in.php?site=1132357179"
    sb.uc_open_with_reconnect(url, 1)
    sb.type('input[name="captcha_code"]', "dummy")
    sb.uc_click('[type="submit"]', reconnect_time=3)
    sb.uc_gui_click_captcha()  # Expected to fail now
    x, y = sb_config._saved_cf_x_y
    print(x, y)
    sb.uc_open_with_reconnect(url, 1)
    sb.type('input[name="captcha_code"]', "dummy")
    sb.uc_click('[type="submit"]', reconnect_time="disconnect")
    sb.sleep(3)
    sb.uc_gui_click_x_y(x, y)
    sb.reconnect(6)

They are stored in sb_config._saved_cf_x_y. If the CAPTCHA is always in the same place on a page, you can jump to sb.uc_gui_click_x_y(x, y) while disconnected, and avoid getting detected when trying to find the CAPTCHA coordinates.

ProtocolNebula commented 1 month ago

Oh, that method is so cool.

I guess they enabled under attack mode or something, because neither with this code it works.

I tried to move to a random position first and wait for 1 second, but is still blank!! I cannot understand, is so frustrating.

What I don't know how the other people is solving this part, the only way It could be is using external payment methods.

I'll try again tomorrow, I hope it solves with magic.

Again, thanks for the whole support. Of course, I'll be glad if someone finds a way to skip this.

mdmintz commented 1 month ago

Screenshot 2024-09-13 at 12 18 07 AM

@ebsawyer This script works in all the environments:

from seleniumbase import SB
from seleniumbase import config as sb_config

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "https://www.indeed.com/jobs?q=%27team+member%27&l=New+York%2C+NY&sort=date&fromage=1&filter=0"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()
    if hasattr(sb_config, "_saved_cf_x_y"):
        x, y = sb_config._saved_cf_x_y
        sb.uc_open_with_disconnect(url)
        sb.sleep(4)
        sb.uc_gui_click_x_y(x, y)
    sb.sleep(4)
mdmintz commented 1 month ago

Screenshot 2024-09-13 at 12 17 16 AM

@ebsawyer You have to reconnect before you can perform actions again. Eg. sb.reconnect(4) waits 4 seconds, then reconnects the driver to the browser. You have to wait a bit after calling sb.uc_gui_click_x_y(x, y) because CF is scanning for a few seconds after you click the CAPTCHA. Here's the full script:

from seleniumbase import SB
from seleniumbase import config as sb_config

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "https://www.indeed.com/jobs?q=%27team+member%27&l=New+York%2C+NY&sort=date&fromage=1&filter=0"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()
    if hasattr(sb_config, "_saved_cf_x_y"):
        x, y = sb_config._saved_cf_x_y
        sb.uc_open_with_disconnect(url)
        sb.sleep(4)
        sb.uc_gui_click_x_y(x, y)
    sb.reconnect(4)

    breakpoint()
mdmintz commented 1 month ago

No matter the URL, use something like this for bypassing CF CAPTCHAs:

from seleniumbase import SB
from seleniumbase import config as sb_config

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "URL"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()
    if hasattr(sb_config, "_saved_cf_x_y"):
        x, y = sb_config._saved_cf_x_y
        sb.uc_open_with_disconnect(url)
        sb.sleep(4)
        sb.uc_gui_click_x_y(x, y)
        sb.reconnect(4)

    breakpoint()
bjornkarlsson commented 1 month ago

No matter the URL, use something like this for bypassing CF CAPTCHAs:

from seleniumbase import SB
from seleniumbase import config as sb_config

with SB(uc=True, test=True, locale_code="en") as sb:
    url = "URL"
    sb.uc_open_with_reconnect(url, 4)
    sb.uc_gui_click_captcha()
    if hasattr(sb_config, "_saved_cf_x_y"):
        x, y = sb_config._saved_cf_x_y
        sb.uc_open_with_disconnect(url)
        sb.sleep(4)
        sb.uc_gui_click_x_y(x, y)
        sb.reconnect(4)

    breakpoint()

I have tried the snippet above with the latest changes, however I have experienced a few issues:

The following change: https://github.com/seleniumbase/SeleniumBase/blob/master/seleniumbase/core/browser_launcher.py#L986

Triggers a new refresh of the page, and a sleep of 3.8 seconds, followed by a click on the input box, the issues I see are:

1) During that time laps the input box hasn't rendered 2) Some time just the refresh let the verification go through, but it's still followed by _uc_gui_click_x_y which can result in random link being clicked from the target webpage.

Then the following: https://github.com/seleniumbase/SeleniumBase/blob/master/seleniumbase/core/browser_launcher.py#L1004

What does the blind variable stands for?

The issue i have noticed here is the following:

1) Assuming that verification has gone through however the connection is slow (as in my case using a proxy), the Ray footer is still being shown but we are being redirected to the target page, this will result in another refresh of the page (as I do experience at least three refreshes of the page with the new changes) as of this:

https://github.com/seleniumbase/SeleniumBase/blob/master/seleniumbase/core/browser_launcher.py#L1004

Here we bump in the same issues of the 3.8 seconds that would suffice to render the page with the input box fully and then rely on pyautogui to click the input box when ready. I had this process stalling.

I am not sure if there can be a wait in between that pauses the execution until the input box is present and can be clicked or can the timeout value for uc_open_with_disconnect be configured?

Previously the flow was more simple as there were no the extra refreshes of the page due the new conditions, I would uc_open_with_disconnect for a longer timeout, and the input box would always be rendered and clicked, still has time assumptions in place but 99.9% of the time it worked

mdmintz commented 1 month ago

There will be an extra page refresh because Selenium gets detected when calculating the CAPTCHA coordinates.

Adjust your script as needed in https://github.com/seleniumbase/SeleniumBase/issues/3130#issuecomment-2348840356.

mat-shur commented 1 month ago

I'm in newest version but these functions doesn't work sb.uc_gui_click_captcha() sb.uc_gui_handle_captcha()

Just nothing happened...

mdmintz commented 1 week ago

Closing this in favor of https://github.com/seleniumbase/SeleniumBase/issues/3236.