ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.96k stars 1.16k forks source link

Website to test functionality #596

Open MacMarde opened 2 years ago

MacMarde commented 2 years ago

Until now I have used https://pypi.org/project/selenium-stealth/ which didn't help me.

But now I am asking myself if I could combine uc with selenium-stealth?

I guess this question is hard to answer, so is there a simple solution to test the stealth technics applied with uc ? So that I can play around a bit and see if it is still working.

sebdelsol commented 2 years ago

It would help for the headless mode which has barely no evasions (and from this package code it seems to be a bunch of js to evade headless detection ?). I might check a little more latter...

If you don’t use headless undetected-chrome spawn a regular Chrome before the driver and patch the driver so it’s very hard to detect (I don’t know about ways to detect it but I’m sure there are).

MacMarde commented 2 years ago

@sebdelsol Honestly I do not know what this package does in detail. But to use it you need to pass the python webdriver object to it. That's all. But I do not know how it inteferes with the uc webdriver object.

If you don’t use headless undetected-chrome spawn a regular Chrome before the driver and patch the driver so it’s very hard to detect (I don’t know about ways to detect it but I’m sure there are).

I do not use headless chrome, but I do not understand what you mean.

MacMarde commented 2 years ago

@sebdelsol Can you please explain?

sebdelsol commented 2 years ago

if you don't use headless chrome then selenium-stealth javascript tricks which mock an non-headless chrome are not that useful. (but I need to read its code more thoroughly to be sure).

Undetected-chrome driver relies on two simple effective tricks to hide it's Selenium based :

But you don't wan't to use the headless option : headless Chrome behaves very differently than regular Chrome and there are many ways to detect it with simple client-side javascript. Anyway selenium-stealth has some evasion techniques I was not aware of. But this package is 2 year old and anti-bots company keeps finding new techniques. This is an endless cat & mouse game.

tldr : Non-headless undetected-chromedriver is a better strategy than relying on headless Chrome evasion techniques based on a mere chromedriver spawned by Selenium.

EDIT : the new game is not about having a stealth driver (undetected-chromedriver still works well), but it's all about

MacMarde commented 2 years ago

@sebdelsol Thank you very much for your detailed answer.

Undetected-chrome driver relies on two simple effective tricks to hide it's Selenium based :

it spawns Chrome as a detached process so it behaves like your regular Chrome as much as possible.
it patches the driver so that there are no detectable variables left.
no need for further javascript injection !

Thank you for that. I was not aware of this.

But you don't wan't to use the headless option : headless Chrome behaves very differently than regular Chrome and there are many ways to detect it with simple client-side javascript. Anyway selenium-stealth has some evasion techniques I was not aware of. But this package is 2 year old and anti-bots company keeps finding new techniques. This a cat & mouse game.

I am not using headless chrome, but you are right, selenium-stealth is basically about hiding headless chrome as described here: https://intoli.com/blog/making-chrome-headless-undetectable/ But anyware there are some features that may also be important for non-headless chrome. You can test it here and also here. Anyway I am not sure about how it works. But we can not let the cats win ;-)

EDIT : the new game is not about having a stealth driver (undetected-chromedriver still works well), but it's all about

fingerprinting to allow rate limitation : when I can ID you, I can limit your usage of the site to what a human need.
The detection of bot behavior vs regular human behavior : If you don't behaves like a human I serve you a captcha or worse...
Some techniques to prevent easy scrapping (shadow-root, obfuscated DOM xpath)

I am aware of these technics and trying to get rid of them as well.

sebdelsol commented 2 years ago

I'm just an amateur (and find this game fascinating). I've my own pet project that (fairly) scrape some sites for fun.

Anyway I've had to extend the Selenium ActionChains class to add some basic "human like" actions : random pauses based on actual human reaction time, keys send one by one, mouse move that takes time to go to a point (useful for sliders). I've even seen some people using Bézier curve + random noise to make their mouse movements even more human... this is an endless endeavor... good luck with your project(s).

MacMarde commented 2 years ago

I do not know much about web scraping and what it is good for.

But I have used some bots in the past to make money and for other things. At least it was working for some time. As you said it is a cat&mouse game. Atm I have no more working bots.