seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
5.05k stars 945 forks source link

Why Recorder Mode can't bypass captcha? #3099

Closed FraneCal closed 1 week ago

FraneCal commented 1 week ago

Hi,

I have used Record mode to record my behavior on this site.

The program should click on Dashboard, after which a Cloud fare captcha will appear. But when I click it just keeps refreshing.

I know how to solve it when using the code, but is it possible to do it when recording?

Thanks.

mdmintz commented 1 week ago

Recorder Mode is for creating scripts. UC Mode is for bypassing CAPTCHAs. You can combine the two though. Launch the Recorder Desktop App like this:

sbase recorder --uc

After starting the recording using the "https://scrape.do/" URL, I clicked the Dashboard link, clicked the CF CAPTCHA, typed a username & password, and then ended the recording on the command-line by typing c and pressing Enter. Here's the script that was generated:

from seleniumbase import BaseCase
BaseCase.main(__name__, __file__)

class RecorderTest(BaseCase):
    def test_recording(self):
        self.open("https://scrape.do/")
        self.click('a[href="https://dashboard.scrape.do"]')
        self.open_if_not_url("https://dashboard.scrape.do/login?ReturnUrl=/")
        self.open("https://dashboard.scrape.do/login?ReturnUrl=/")
        self.type("input#username", "username")
        self.type("input#userpassword", "pass")

Since there were some redirects, there were a few extra lines added that can be removed manually. Also, the recorded script won't have the UC Mode methods that you need to add separately, such as uc_open_with_reconnect(url, reconnect_time), uc_click(selector), and uc_gui_click_captcha(). Also, scripts are saved as Syntax Format 1, where scripts are called with pytest, and you have to add the --uc command-line option to enable UC Mode. The modified script would look like this:

from seleniumbase import BaseCase
BaseCase.main(__name__, __file__, "--uc")

class RecorderTest(BaseCase):
    def test_recording(self):
        self.uc_open_with_reconnect("https://scrape.do/", 3)
        self.uc_click('a[href="https://dashboard.scrape.do"]', 4)
        self.uc_gui_click_captcha()
        self.type("input#username", "username")
        self.type("input#userpassword", "pass")

After some find/replace actions, you can convert that to the SB() format:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.uc_open_with_reconnect("https://scrape.do/", 3)
    sb.uc_click('a[href="https://dashboard.scrape.do"]', 4)
    sb.uc_gui_click_captcha()
    sb.type("input#username", "username")
    sb.type("input#userpassword", "pass")

The real problem that you'll face is dealing with the reCAPTCHA on the last page. Although SeleniumBase can solve Cloudflare Turnstile CAPTCHA easily, Google reCAPTCHA is more advanced, and requires solving the audio CAPTCHA to bypass it with bots. (reCAPTCHA may even throw a challenge even if it doesn't detect a bot at that moment.)

To solve the Google reCAPTCHA Audio Challenge, you'll need to use an external repo: https://github.com/search?q=pydub.AudioSegment.from_mp3+recaptcha+solver+language%3APython&type=code (And then combine that solution with the rest of your code.)

As for the UC Mode parts, make sure you have read the UC Mode documentation so that you know when to use the special methods. For example, use uc_click(selector, 4) if a click would take you to a page where there's a CAPTCHA. Call uc_gui_click_captcha() to make sure that CAPTCHAs are clicked successfully whenever you reach a CAPTCHA.