ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.07k stars 1.09k forks source link

CF Detecting Selenium #1497

Open vyanezz opened 10 months ago

vyanezz commented 10 months ago

CF is detecting Selenium.

some help? It's detecting from the first driver.get.

I'm using:

python 3.8 selenium 11.2 Chrome 116 UC with jdholtz PR

Thanks.

YOTYTeaM commented 10 months ago

hello, use this codes https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/1496#issue-1859296239 it works for CF ( worked for me)

vyanezz commented 10 months ago

Not working for me… any other solution? Yesterday was working correctly.

Thanks.

mdmintz commented 10 months ago

CF may have pushed another update yesterday. I had to make updates and release seleniumbase 4.17.9 last night for some existing scripts to bypass detection more frequently. Try pip install -U seleniumbase, and then run the following script with python: (Using https://github.com/seleniumbase/SeleniumBase)

from seleniumbase import Driver
import time

driver = Driver(uc=True)
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

Navigating to https://hmaker.github.io/selenium-detector/ also bypasses detection with the latest version.

YOTYTeaM commented 10 months ago

CF may have pushed another update yesterday. I had to make updates and release seleniumbase 4.17.9 last night for some existing scripts to bypass detection more frequently. Try pip install -U seleniumbase, and then run the following script with python: (Using https://github.com/seleniumbase/SeleniumBase)

from seleniumbase import Driver
import time

driver = Driver(uc=True)
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

Navigating to https://hmaker.github.io/selenium-detector/ also bypasses detection with the latest version.

that working to me in https://nowsecure.nl/#relax but it cant bypass https://hmaker.github.io/selenium-detector/ image

also i tested it in accounts.google.com, google detects me that i am using selenium or automatic things ( just made driver and did the work manually with the driver )

mdmintz commented 10 months ago

@YOTYTeaM If you're using regular driver commands, of course you'll be detected. Follow the documentation to write scripts correctly. For example, a Gmail login:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://www.google.com/gmail/about/")
    sb.click('a[data-action="sign in"]')
    sb.type('input[type="email"]', "test123@gmail.com")
    sb.click('button:contains("Next")')
    import pdb; pdb.set_trace()
    # sb.type('input[type="password"]', PASSWORD)
    # sb.click('button:contains("Next")')
dsekz commented 10 months ago

CF may have pushed another update yesterday. I had to make updates and release seleniumbase 4.17.9 last night for some existing scripts to bypass detection more frequently. Try pip install -U seleniumbase, and then run the following script with python: (Using https://github.com/seleniumbase/SeleniumBase)

from seleniumbase import Driver
import time

driver = Driver(uc=True)
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

Navigating to https://hmaker.github.io/selenium-detector/ also bypasses detection with the latest version.

pip install seleniumbase but for some reason I get this error

ModuleNotFoundError: No module named 'seleniumbase'

mdmintz commented 10 months ago

@6echs Maybe you installed it into a different virtual environment, or you have different Python versions on your machine.

dsekz commented 10 months ago

@mdmintz ,

I get the same error on VPS and other libraries do not give this error, it supports the latest version of Python, right?

mdmintz commented 10 months ago

@6echs Latest Python is supported.

Have you tried:

Mac/Linux:

python3 -m pip install -U seleniumbase

or Windows:

py -m pip install -U seleniumbase
YOTYTeaM commented 10 months ago

@YOTYTeaM If you're using regular driver commands, of course you'll be detected. Follow the documentation to write scripts correctly. For example, a Gmail login:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://www.google.com/gmail/about/")
    sb.click('a[data-action="sign in"]')
    sb.type('input[type="email"]', "test123@gmail.com")
    sb.click('button:contains("Next")')
    import pdb; pdb.set_trace()
    # sb.type('input[type="password"]', PASSWORD)
    # sb.click('button:contains("Next")')

wow , google cant detect me thank you

# user agent from fake_useragent
with SB(uc=True, agent=user_agent) as sb:
    import pdb; pdb.set_trace()

but selenium detector still detection, detected same thing that i said things I know (not sure):

this variables in window.document, says that we are using selenium (window-document-aux-vars): $chrome_asyncScriptInfo and ^\$cdc_[a-zA-Z0-9]{22}_$

Selenium using Document.prototype.querySelector and Document.prototype.querySelectorAll to find element. websites can overwriting these to detect selenium

and about the execute_script detection, dont know how is it works ( dont know js ) the codes that site using to detect execute_script:

class JSCallStackTest extends SeleniumDetectionTest {

    constructor(name, desc, callStack, stackSignatures) {
        super(name, desc);
        this._callStack = callStack;
        this._stackSignatures = stackSignatures;
    }

    test() {
        if (this._callStack === null) return false;
        for (let i = 1; i < this._callStack.length; i++) {
            if (this._stackSignatures.some(signature => signature.test(this._callStack[i])))
                return true;
        }
        return false;
    }
}

/**
 * see https://source.chromium.org/chromium/chromium/src/+/main:chrome/test/chromedriver/js/execute_script.js;l=13
 * https://source.chromium.org/chromium/chromium/src/+/main:chrome/test/chromedriver/js/call_function.js;l=426
 */
class ExecuteScriptTest extends JSCallStackTest {

    constructor(window, name, desc) {
        super(name, desc, null, [/ executeScript /, / callFunction /]);
        this._createToken(window);
    }

    _createToken(window) {
        this.token = Math.random().toString().substring(2);
        const self = this;
        window.Object.defineProperty(window, 'token', {
            configurable: false,
            enumerable: false,
            get: function() {
                try {
                    null[0];
                } catch(e) {
                    if (self._callStack === null)
                        self._callStack = e.stack.split('\n');
                }
                return self.token;
            }
        });
    }
}

source: https://hmaker.github.io/selenium-detector/chromedriver.js

mdmintz commented 10 months ago

@YOTYTeaM The tasks asked by https://hmaker.github.io/selenium-detector/ are not realistic for regular sites. No regular site is going to ask you to run commands directly in the console to get a value, as it says for you to run execute_script.

The anti-detection measures of SeleniumBase work for regular sites, especially page loads before you've clicked or entered text somewhere.

hansalemaos commented 10 months ago

@6echs Latest Python is supported.

Have you tried:

Mac/Linux:

python3 -m pip install -U seleniumbase

or Windows:

py -m pip install -U seleniumbase

Works perfectly! Thx!

dsekz commented 10 months ago

@mdmintz ,

I got it, bro, it works like a charm. I respect your work.

vyanezz commented 10 months ago

CF may have pushed another update yesterday. I had to make updates and release seleniumbase 4.17.9 last night for some existing scripts to bypass detection more frequently. Try pip install -U seleniumbase, and then run the following script with python: (Using https://github.com/seleniumbase/SeleniumBase)

from seleniumbase import Driver
import time

driver = Driver(uc=True)
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

Navigating to https://hmaker.github.io/selenium-detector/ also bypasses detection with the latest version.

Not working when I do driver.get to the page I want to scrape…, what missing?

mdmintz commented 10 months ago

@vyanezz Is the page you're trying to scrape not returning a 403? Eg.

>>> import requests
>>> requests.get("https://nowsecure.nl/#relax").status_code
403

For 403s and some others, SeleniumBase modifies driver.get() so that the modified chromedriver disconnects from Chrome for a few seconds to avoid detection. This can be a noticeable slowdown, so it's only set if requests.get() is blocked (returns 403), which probably means that Selenium will be blocked too. If that's not the case, use the special driver.uc_open_with_tab(url) to get that same result, even if not a 403 (or similar).

driver.uc_open_with_tab(url)  # Use this instead of driver.get(url)

(Note that only SeleniumBase tests have that new driver.uc_open_with_tab(url) command for UC Mode.)

vyanezz commented 10 months ago
driver.uc_open_with_tab(url)

@vyanezz Is the page you're trying to scrape not returning a 403? Eg.

>>> import requests
>>> requests.get("https://nowsecure.nl/#relax").status_code
403

For 403s and some others, SeleniumBase modifies driver.get() so that the modified chromedriver disconnects from Chrome for a few seconds to avoid detection. This can be a noticeable slowdown, so it's only set if requests.get() is blocked (returns 403), which probably means that Selenium will be blocked too. If that's not the case, use the special driver.uc_open_with_tab(url) to get that same result, even if not a 403 (or similar).

driver.uc_open_with_tab(url)  # Use this instead of driver.get(url)

(Note that only SeleniumBase tests have that new driver.uc_open_with_tab(url) command for UC Mode.)

I've tried with driver.uc_open_with_tab(url) and also is detecting.. what I want to do is open an url and loop through others finding elements.

Thanks for all

hansalemaos commented 10 months ago
```shell
py -m pip install -U seleniumbase

Amazing stuff - your library is the best library that I have seen so far, works even with bet365 - I have just tested it:

https://github.com/hansalemaos/bet365_web_scraping/blob/main/betscrape2.py

ultrafunkamsterdam commented 10 months ago

@mdmintz ,

I get the same error on VPS and other libraries do not give this error, it supports the latest version of Python, right?

Datacenter ip will show captcha nonetheless, even when using normal Chrome browser. Nothing will fix that

lizfischer commented 10 months ago

@YOTYTeaM If you're using regular driver commands, of course you'll be detected. Follow the documentation to write scripts correctly. For example, a Gmail login:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://www.google.com/gmail/about/")
    sb.click('a[data-action="sign in"]')
    sb.type('input[type="email"]', "test123@gmail.com")
    sb.click('button:contains("Next")')
    import pdb; pdb.set_trace()
    # sb.type('input[type="password"]', PASSWORD)
    # sb.click('button:contains("Next")')

This method isn't working for me on discogs.com -- any advice @mdmintz?

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://www.discogs.com/")
    sb.click('#log_in_link')
    sb.type('#username', username)
    sb.type('#password', password)
    # sb.submit('#password')
mdmintz commented 10 months ago

@lizfischer After you've gotten here...

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://discogs.com")
    # ...

...then use https://github.com/asweigart/pyautogui to type text and click without invoking selenium commands in the browser, which would get you detected. SeleniumBase gets you through the front door to any site looking like a human. After that, if you need to perform actions while avoiding detection, use a tool like pyautogui to perform those actions from outside a web browser. (I'm not the pyautogui expert, so perhaps see that documentation and examples for using that.)

hansalemaos commented 10 months ago

@YOTYTeaM If you're using regular driver commands, of course you'll be detected. Follow the documentation to write scripts correctly. For example, a Gmail login:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://www.google.com/gmail/about/")
    sb.click('a[data-action="sign in"]')
    sb.type('input[type="email"]', "test123@gmail.com")
    sb.click('button:contains("Next")')
    import pdb; pdb.set_trace()
    # sb.type('input[type="password"]', PASSWORD)
    # sb.click('button:contains("Next")')

This method isn't working for me on discogs.com -- any advice @mdmintz?


from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://www.discogs.com/")
    sb.click('#log_in_link')
    sb.type('#username', username)
    sb.type('#password', password)
    # sb.submit('#password')

You might want to try this. It is working on my PC as you can see here: https://www.youtube.com/watch?v=0H-BYbU8Gkg

import random
from time import sleep

from a_selenium_better_sendkeys import send_keys_alternative
from seleniumbase import Driver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from a_selenium2df import get_df
from PrettyColorPrinter import add_printer

from myusernameandpass import user, password

def get_dataframe():
    df = pd.DataFrame()
    while df.empty:
        df = get_df(
            driver,
            By,
            WebDriverWait,
            expected_conditions,
            queryselector="*",
            with_methods=True,
        )
    return df

add_printer(1)
driver = Driver(uc=True)
driver.get("https://www.discogs.com/")
df = get_dataframe()
df.loc[df.aa_innerHTML.str.contains("log_in_link", na=False, regex=True)].iloc[
    -1
].se_click()
df = get_dataframe()
df2 = df.loc[df.aa_localName.str.contains("input")]
for q in user:
    send_keys_alternative(driver, df2.element.iloc[0], q)
    sleep(random.uniform(0.05, 0.1))
for q in password:
    send_keys_alternative(driver, df2.element.iloc[1], q)
    sleep(random.uniform(0.05, 0.1))

df.loc[df.aa_innerText.str.contains("Log in", na=False)].iloc[-1].se_click()

I am using SeleniumBase and 3 modules that I wrote: https://github.com/hansalemaos/a_selenium_better_sendkeys (faster, more reliable send_keys) https://github.com/hansalemaos/a_selenium2df (gets all webelements and their attributes/properties in one go [I hate using selectors - pandas is much better hahaha] https://github.com/hansalemaos/PrettyColorPrinter (make the DataFrame prettier) If it is not working, your IP might be blacklisted.

NourEssalam commented 10 months ago

CF may have pushed another update yesterday. I had to make updates and release seleniumbase 4.17.9 last night for some existing scripts to bypass detection more frequently. Try pip install -U seleniumbase, and then run the following script with python: (Using https://github.com/seleniumbase/SeleniumBase)

from seleniumbase import Driver
import time

driver = Driver(uc=True)
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

Navigating to https://hmaker.github.io/selenium-detector/ also bypasses detection with the latest version.

it gives me this error :

File "/home/benslimane/callcenterEPR/django/.venv/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:9222 from chrome not reachable Stacktrace:

0 0x55a8eeaede23

1 0x55a8ee8165f6

mdmintz commented 10 months ago

@NourEssalam Looks like either Chrome isn't installed, or you ran a different script. That stack trace is coming from raw selenium, not seleniumbase. (I would need to see a seleniumbase stack trace to help.)

msaidztrk commented 10 months ago

CF may have pushed another update yesterday. I had to make updates and release seleniumbase 4.17.9 last night for some existing scripts to bypass detection more frequently. Try pip install -U seleniumbase, and then run the following script with python: (Using https://github.com/seleniumbase/SeleniumBase)

from seleniumbase import Driver
import time

driver = Driver(uc=True)
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

Navigating to https://hmaker.github.io/selenium-detector/ also bypasses detection with the latest version.

Thanks for theexample. But i have a problem in this. If i run this script without any vpn connection it works and bypasses cloudflare but otherwise , if vpn is enable then it cant bypass. Is there a solution for this. Btw i am using the same vpn connecitons on my local browsers and they are not causes any problem on cloudflare.

mdmintz commented 10 months ago

@msaidztrk If the VPN is causing issues, you can change proxy settings to get around that.

There's a proxy option:

proxy=None,  # Use proxy. Format: "SERVER:PORT" or "USER:PASS@SERVER:PORT".

https://github.com/seleniumbase/SeleniumBase/blob/101bd16e37e7b34bc5481cb17e152dce8b43b0a7/seleniumbase/plugins/driver_manager.py#L74

NourEssalam commented 10 months ago

@NourEssalam Looks like either Chrome isn't installed, or you ran a different script. That stack trace is coming from raw selenium, not seleniumbase. (I would need to see a seleniumbase stack trace to help.)

msaidztrk commented 10 months ago

@msaidztrk If the VPN is causing issues, you can change proxy settings to get around that.

There's a proxy option:

proxy=None,  # Use proxy. Format: "SERVER:PORT" or "USER:PASS@SERVER:PORT".

https://github.com/seleniumbase/SeleniumBase/blob/101bd16e37e7b34bc5481cb17e152dce8b43b0a7/seleniumbase/plugins/driver_manager.py#L74

thanks for your response but can you help with that code line you gave me. Where should i write that line into my main phyton code. And i am using vpns via chrome extensions. So i installed vpn into chrome driver and tried after opened vpn and it wont bypass cloudflare with it.

mdmintz commented 10 months ago

@msaidztrk For UC Mode with proxy:

from seleniumbase import Driver
import time

driver = Driver(uc=True, proxy="USER:PASS@IP:PORT")  # With connection details
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()
aavvaavvaa commented 10 months ago

@mdmintz Hello, is it possible to use it as multiple instances ? I tried to use threads but it doesn't work.

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.open("https://discogs.com")
    # ...
mdmintz commented 10 months ago

@aavvaavvaa See https://github.com/seleniumbase/SeleniumBase/issues/2006#issuecomment-1679789785

msaidztrk commented 10 months ago

@msaidztrk For UC Mode with proxy:

from seleniumbase import Driver
import time

driver = Driver(uc=True, proxy="USER:PASS@IP:PORT")  # With connection details
driver.get("https://nowsecure.nl/#relax")
time.sleep(6)
driver.quit()

@mdmintz Thanks for your reply again but i guess it wont work for my case. I dont write proxy specifically. After driver opens chrome , i'll install a vpn extension from chrome store and then automatically connecting to a proxy provided by the vpn plugin. So i dont and cant know the details of the which proxy i'll connect. And with any vpn connection , i cant bypass any cloudflare protection

msaidztrk commented 10 months ago

@mdmintz is there no solution for my last response