ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
10.14k stars 1.17k forks source link

Cloudfare protection #37

Closed hedior03 closed 4 years ago

hedior03 commented 4 years ago

The driver is still detected by cloudfare protection, Platzi is a website im trying to web-scrap and i haven't been able to.

sla-te commented 4 years ago

mc-market.org seems to be working with 1.5.0 (and wasnt with the previous version) but I noticed, that as soon as I activate https://antcpt.com/eng/download/google-chrome-options.html it will lock again.

there is a minor error in the code https://github.com/ultrafunkamsterdam/undetected-chromedriver/blob/e8d4050f3cbd92979bf51683d7af1e370951ffb7/undetected_chromedriver/__init__.py#L152 it must be self not self_, does this maybe fix your issue?

czoins commented 4 years ago

Apparently passing your own Options() overrides the options in init.py. You should manually add them until he fixes the issue:

instance.add_argument("start-maximized")
instance.add_experimental_option("excludeSwitches", ["enable-automation"])
instance.add_argument("--disable-blink-features=AutomationControlled")
sla-te commented 4 years ago

in my fork i took care of this one, its not a very good solution but it works. https://github.com/chwba/undetected-chromedriver/blob/master/undetected_chromedriver/__init__.py

Still though as mentioned above it seems that for some reason if i activate that captcha solver plugin, that still seems to be detected, would be great to have a workaround for that, because if we cant solve captchas anymore it will be difficult to access some sites.

ultrafunkamsterdam commented 4 years ago

the _ after self is intended

ultrafunkamsterdam commented 4 years ago

instance.add_argument("start-maximized") instance.add_experimental_option("excludeSwitches", ["enable-automation"]) instance.add_argument("--disable-blink-features=AutomationControlled")

These are the default, so no need to overwrite. All that is non-standard is unsupported.

Regarding platzi, get another ip, since yours might be flagged. cannot reproduce the issue :

In [1]: import undetected_chromedriver as uc

In [2]: driver = uc.Chrome()
Selenium patched. Safe to import Chrome / ChromeOptions
Selenium patched. Safe to import Chrome / ChromeOptions

DevTools listening on ws://127.0.0.1:19576/devtools/browser/642a3b15-d112-42bf-b222-9b89ae83649b
In [3]: driver.get('https://platzi.com/')

In [4]: driver.save_screenshot('platzi.png')
Out[4]: True

platzi

mc-market.org seems to be working with 1.5.0 (and wasnt with the previous version) but I noticed, that as soon as I activate https://antcpt.com/eng/download/google-chrome-options.html it will lock again.

there is a minor error in the code

https://github.com/ultrafunkamsterdam/undetected-chromedriver/blob/e8d4050f3cbd92979bf51683d7af1e370951ffb7/undetected_chromedriver/__init__.py#L152

it must be self not self_, does this maybe fix your issue?

No it must not, in python it can be whatever you want 👍 Of course an anti-captha plugin is detected since javascript can just check what plugins are active, this is by design. I guess an anticaptcha-service and cloudflare are by definition "uncompatible" . I've had good results in the past using 2captcha. Another way to do it is using this library: https://github.com/Anorov/cloudflare-scrape

sla-te commented 4 years ago

I see, in my fork the self_ had caused issues, probably this is due to my changes then.

Hmm but the plugin worked perfectly fine for a long time before cloudflare patched their procedure, isnt there maybe a script we could inject to return a fake list of plugins to bypass this?

Yes I am aware of https://github.com/Anorov/cloudflare-scrape and already had tried to implement it but if there is a captcha to be solved, invisible or visible during the browser check this library will fail even if he fixes it (currently broken). - The only way to make the plugin work I could imagine is to 'not load' it on first starting the chrome instance and loading it after the initial check is completed but sadly its not possible (afaik) to load a plugin after the chrome instance has already initialized/started the browser.

Regarding 2captcha, I could not figure out how to implement 2captcha to solve captchas and inject the solution while the browser is running. - I do know how to use 2captcha if using requests directly but this doesnt help me in this scenario.

EDIT: Also the nice part about the anti-captcha plugin is, that it will simply solve any captcha, that appears anywhere on the site which is a huge comfort if configured correctly.

EDIT2: I found https://intoli.com/blog/making-chrome-headless-undetectable/ and am trying to implement the plugins and languages part but im not js/selenium pro.

I tried:

            if instance.execute_script("return navigator.languages"):
                instance.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
                    "source": """
                            Object.defineProperty(navigator, 'languages', {
                                get: function() {
                                    return ['en-US', 'en'];
                                },
                            });"""
                })

            if instance.execute_script("return navigator.plugins"):
                instance.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
                    "source": """
                            Object.defineProperty(navigator, 'plugins', {

                                get: function () {
                                    return [1, 2, 3, 4, 5];
                                },
                            });"""
                })

that drops me a 'circular reference error' when starting.