ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.54k stars 1.14k forks source link

Are rotating user agent, changing viewport etc. still relevant with undetected-chromedriver? #861

Open ilgrank opened 1 year ago

ilgrank commented 1 year ago

Hi This is a request for clarification and not an issue: if it is not pertaining here, please pardon me, I'll remove it immediately.

I'm using undetected-chromedriver, and while nowsecure.nl clearly shows undetected-chromedriver is working well, I still get detections/bans on some sites I'm scraping with multiple (serialized) requests.

Using 'vanilla' chromedriver I was using well known best practices such as: rotating the user agent randomizing the viewport using "--disable-blink-features=AutomationControlled" and "--disable-blink-features" setting navigator.webdriver to false etc.

What of the above is still relevant and which is unnecessary with undetected-chromedriver? E.g., I've noticed that navigator.webdriver is already set to false without explicitly modifying it with JS code. Any idea? Thanks!

Chetan11-dev commented 1 year ago

Possible Issues:

  1. Incorrect user agents eg. if you are using linux device and saying my user agent is firefox or Windows Chrome, Cloudflare catches you These are valid user agent for linux.
    
    _106 = [
        {"user_agent":        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.37"}] 
    _105 = [
        {"user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"}] 
    _104_1 = [
        {"user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.101 Safari/537.36"}] 
    _104_2 = [
        {"user_agent":  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36"}]
    _103 = [
        {"user_agent":  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36"}] 
    _101 = [
        {"user_agent":  "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951 Safari/537.36"}] 
    _99 = [
        {"user_agent":   "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844 Safari/537.36"}] 
    _100 = [
        {"user_agent":    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896 Safari/537.36"}]
    _98 = [
        {"user_agent":    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758 Safari/537.36"}] 
    _97 = [
        {"user_agent":    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692 Safari/537.36"}] 
    _96 = [
        {"user_agent":    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664 Safari/537.36"}] 
    _95 = [
        {"user_agent":    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638 Safari/537.36"}] 
    _94 = [
        {"user_agent":   "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606 Safari/537.36"}] 

2. Your Ip blacklisted: PerimeterX does that. Just manually change ip using ProtonVpn. If you want to do it programatically use https://github.com/tprasadtp/protonvpn-docker (You should know docker well for this)

3. PerimeterX does catches undetected_driver but if  you can do your task faster than perimeterx can detect you. you will succeed.
If any help needed you can ask :)
ilgrank commented 1 year ago

Thanks for your reply @Chetan11-dev My question however was more generic in nature I mean, I get that rotating user-agent is still a good practice, but what about all the other stuff? (disabling blink features, randomizing the viewport size and so on) Are they still useful in escaping detection or undetected chromedriver makes these obsolete?

nuclear-bean commented 1 year ago

Hi @ilgrank --disable-blink-fetaures option is automatically added as an argument on undetectable driver startup (see it here ) - same as ("excludeSwitches", ["enable-automation"]). Window size is maximed by default, playing with it can still be a good idea (though, not necessary from my experience).

BenjaPrograma commented 1 year ago

Hi @Chetan11-dev I was interested in how to do the 3rd point, as im not sure what you mean. I tried using undetected selenium wire, also PyAutoGUI to unlock captcha, but it seems im going in the wrong direction.

is it maybe that I should scrape one webpage with uc, grab the data, close uc, rotate my IP and repeat? I was thinking that with this approach I would need a very large amount of IP's which can be very expensive depending on the quantity of webpages to scrap.

Any help would be greatly appreciated, Thanks in advance!