seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
MIT License
5.03k stars 945 forks source link

the script is detected as bot #3059

Closed zqxyus closed 2 weeks ago

zqxyus commented 2 weeks ago

I first used code similar to the one below to open a website that I need to crawl. But access is blocked and prohibited. So I used the following code to visit The running results show that the following code is detected as a bot.

'''from seleniumbase import SB with SB(uc=True, incognito=True, test=True) as sb: url="" server="", username= "00007-zone-custom-region-DE-sessid-NkivelA2-sessTime-15",#scrapeops password= "tHx19d0nTan" sb.set_wire_proxy(f"{username}:{password}@{server}") driver=sb.driver.uc_open_with_reconnect(url, 21) sb.sleep(93) '''

The results are listed as follows: Consistent: The scanner did not detect any anomaly. Unsure: The scanner considers that the attributes tested could indicate the presence of a bot, but there is still a chance that it is a human. Inconsistent: The scanner considers that the attributes tested indicate the presence of a bot. Test Result Data

PHANTOM_UA | Consistent | {"userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36"}

PHANTOM_PROPERTIES | Consistent | {"attributesFound":[false,false,false]}

PHANTOM_ETSL | Consistent | {"etsl":33}

PHANTOM_LANGUAGE | Consistent | {"languages":["en-US"]}

PHANTOM_WEBSOCKET | Consistent | {}

MQ_SCREEN | Consistent | {}

PHANTOM_OVERFLOW | Consistent | {"depth":9649,"errorMessage":"Maximum call stack size exceeded","errorName":"RangeError","errorStacklength":711}

PHANTOM_WINDOW_HEIGHT | Consistent | {"wInnerHeight":709,"wOuterHeight":840,"wOuterWidth":1280,"wInnerWidth":1236,"wScreenX":80,"wPageXOffset":0,"wPageYOffset":0,"cWidth":1221,"cHeight":812,"sWidth":1920,"sHeight":1080,"sAvailWidth":1850,"sAvailHeight":1053,"sColorDepth":24,"sPixelDepth":24,"wDevicePixelRatio":1}

HEADCHR_UA | Consistent | {"userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36"}

WEBDRIVER | Inconsistent | {}

HEADCHR_CHROME_OBJ | Consistent | {}


HEADCHR_PLUGINS | Consistent | {"plugins":["PDF Viewer::Portable Document Format::internal-pdf-viewer::application/pdf~pdf~Portable Document Format,text/pdf~pdf~Portable Document Format","Chrome PDF Viewer::Portable Document Format::internal-pdf-viewer::application/pdf~pdf~Portable Document Format,text/pdf~pdf~Portable Document Format","Chromium PDF Viewer::Portable Document Format::internal-pdf-viewer::application/pdf~pdf~Portable Document Format,text/pdf~pdf~Portable Document Format","Microsoft Edge PDF Viewer::Portable Document Format::internal-pdf-viewer::application/pdf~pdf~Portable Document Format,text/pdf~pdf~Portable Document Format","WebKit built-in PDF::Portable Document Format::internal-pdf-viewer::__application/pdf~pdf~Portable Document Format,text/pdf~pdf~Portable Document Format"]}

HEADCHR_IFRAME | Consistent | {}

CHR_DEBUG_TOOLS | Consistent | {}

SELENIUM_DRIVER | Consistent | {"attributesFound":[false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false]}

CHR_BATTERY | Consistent | {}

CHR_MEMORY | Consistent | {}

TRANSPARENT_PIXEL | Consistent | {"0":0,"1":0,"2":0,"3":0}

SEQUENTUM | Consistent | {}

VIDEO_CODECS | Consistent | {"h264":"probably"}

How to get around it? thanks!

mdmintz commented 2 weeks ago

Duplicate of

When I run the following script, I get the same result as when using a regular Chrome browser, so the Inconsistent value there isn't accurate.

from seleniumbase import SB

with SB(uc=True, incognito=True, test=True) as sb:
    url = ""
    sb.uc_open_with_reconnect(url, 8)

The website is a better test for bots. SeleniumBase UC Mode goes undetected.

from seleniumbase import SB

with SB(uc=True, incognito=True, test=True) as sb:
    url = ""
    sb.uc_open_with_reconnect(url, 10)
    sb.remove_elements("jdiv")  # Remove chat widgets
    sb.assert_text("No automation framework detected", "pxlscn-bot-detection")
    not_masking = "You are not masking your fingerprint"
    sb.assert_text(not_masking, "pxlscn-fingerprint-masking")
    sb.highlight("span.text-success", loops=8)
    sb.highlight("pxlscn-fingerprint-masking div", loops=9, scroll=False)
    sb.highlight("", loops=10, scroll=False)
zqxyus commented 2 weeks ago

I used the following code, the access is blocked.

from seleniumbase import SB
with SB(uc=True, incognito=True, test=True) as sb:
    sb.uc_open_with_reconnect(url, 10)
mdmintz commented 2 weeks ago

That page blocked me in my regular Chrome browser (no Selenium). Also, that's not a Cloudflare page. UC Mode is specifically designed for Cloudflare-bypass right now, and some other anti-bot sites.

zqxyus commented 2 weeks ago

@mdmintz how to crack it ? would you like to give me any ideas or guidelines ? Thanks !

mdmintz commented 2 weeks ago

You can try changing your proxy settings, but otherwise there's not much that can be done if it blocks regular Chrome browsers.

zqxyus commented 2 weeks ago

@mdmintz Thank you! I have another question, little information about proxy server setting is found in seleniumbase documentation. The following code is a demo code of proxy server setting based on selenium. If i use seleniumbase, how to set the proxy server?

from selenium import webdriver
def setup_driver():
    # ScrapeOps Proxy setup
    proxy_url = "" 
    api_key = "YOUR_API_KEY"  # Replace this with your ScrapeOps API key
    target_url = ""  
    bypass_level = "generic_level_1"  # Choose the appropriate bypass level

    # Set up Selenium with the ScrapeOps Proxy
    proxy = f"http://{api_key}:{proxy_url}/?target_url={target_url}&bypass={bypass_level}"
    chrome_options = webdriver.ChromeOptions()

    # Initialize the WebDriver
    driver = webdriver.Chrome(options=chrome_options)
    return driver

def main():
    driver = setup_driver()
    #E.g let's extract title of the webpage
    print("Page title:", driver.title) 

if __name__ == '__main__':
mdmintz commented 2 weeks ago

Set the proxy arg: