seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
5.16k stars 958 forks source link

Can't pass bot detection on dexscreener.com on Linux #2683

Closed vmolostvov closed 5 months ago

vmolostvov commented 5 months ago

Hello sir @mdmintz ! First of all, let me thank you for your amazing job with SeleniumBase! It's really cool soft around Selenium and Chromedriver!

So I have some problem with passing cloudflare detection on website dexscreener.com

My code:

def get_scr_hp_ss(pair, chain):
    with SB(uc=True, proxy=proxy, agent=user_agent, xvfb=True) as sb:
        sb.driver.uc_open_with_reconnect(dexscreener_pair_url.format(chain, pair), reconnect_time=15)

So this code perfectly works locally on my MacOS, but on my Linux VDS cloudflare doesn't let me in. I'm using same proxy and same user-agent as on my local pc. I tried different ways such as save user-data-dir on local pc and use it on vds, opening in new tab with timeout, with/without user_agent/proxy, etc.

So on my local PC, using code above, dexscreener cf first of all checking connection and within 5-7 sec passing without any captchas. On my server it's asking me to click the captcha and I'm trying to click it using this code:

def cloudflare_captcha(sb):
    try:
        frame = sb.driver.wait_for_element_visible('iframe', timeout=15)
        print('Found frame! Switching...')
        sb.driver.switch_to.frame(frame)
        print('Switched! Trying to click checkbox...')
        time.sleep(random.randint(3, 6))
        sb.driver.uc_click(selector='label[class="ctp-checkbox-label"]', reconnect_time=15)
        print('Clicked!')
        sb.driver.switch_to.default_content()
    except:
        print(traceback.format_exc())

And then it's just infinity looping...

So there is something on the server that triggered bot detection. What could it be? Maybe it's detecting xvfb use?

mdmintz commented 5 months ago

Changing the User Agent manually may get you detected, as SeleniumBase already chooses the optimal one to use. Also, you may want to add headed=True when using xvfb=True on Linux so that you override the default headless mode on Linux. That should be enough to get past your issue assuming they haven't already blocked your Linux IP Address.

vmolostvov commented 5 months ago

Just tested this code on my second server that never been connected with dexscreener and unfortunately there is same problem...

def get_scr_hp_ss(pair, chain):
    with SB(uc=True, xvfb=True, headed=True) as sb:
        sb.driver.uc_open_with_reconnect(dexscreener_pair_url.format(chain, pair), reconnect_time=15)

I don't think that my first server ip can be in block because I have code running on this server that sending requests to dexscreener api and it's working without any problems.

mdmintz commented 5 months ago

Can you run a script to see what your User Agent is when on your Linux machine? That may lead to a solution.

vmolostvov commented 5 months ago
ua = sb.driver.execute_script("return navigator.userAgent;")
print(ua)

>>> Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36

Tried this user agent on my local Mac "agent=ua" and it passed...

mdmintz commented 5 months ago

In order to determine exactly why you were detected on your Linux machine, (but not on your Mac), you can run against https://pixelscan.net/ and see exactly how they're detecting you:

pixelscan

vmolostvov commented 5 months ago

This check gives nothing, it says "very likely you are using proxy" although I don't use them, moreover I can use proxy on local Mac and it won't detect me. Hope this thread will see someone who can test this website on vds Linux !

vmolostvov commented 5 months ago

Good news! Passed dexscreener on my Linux machine, problem was in this code

def cloudflare_captcha(sb):
    try:
        frame = sb.driver.wait_for_element_visible('iframe', timeout=15)
        print('Found frame! Switching...')
        sb.driver.switch_to.frame(frame)
        print('Switched! Trying to click checkbox...')
        time.sleep(random.randint(3, 6))
        sb.driver.uc_click(selector='label[class="ctp-checkbox-label"]', reconnect_time=15)
        print('Clicked!')
        sb.driver.switch_to.default_content()
    except:
        print(traceback.format_exc())

Working one:

def cloudflare_captcha(sb):
    try:
        sb.switch_to_frame("iframe")
        sb.driver.uc_click("span.mark")
        sb.driver.switch_to.default_content()
    except:
        print(traceback.format_exc())

And it works with proxy and with manually changed user-agent. Dexscreener still asking to click the captcha but it's easily solving with this code. Again thanks @mdmintz for this amazing functions!