ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.82k stars 1.15k forks source link

Undetected chrome driver detected by a site #56

Closed sezonis closed 3 years ago

sezonis commented 3 years ago

I'm not sure what it does to detect it. But it seems to happen with the web driver only. Loading in normal chrome works perfectly fine.

The link can be found here. https://secure.runescape.com/m=weblogin/loginform?theme=runescape&mod=www&ssl=1&dest=community

It was recently updated yesterday to reflect this change.

Maybe you might know the problem

czoins commented 3 years ago

Loads fine for me. Is the issue related to logging in, or simply loading the page? Are you using the latest patch?

sezonis commented 3 years ago

I have made a new discovery. It loads fine for you because you haven't been on it before. When you use send keys to fill out the boxes and invoke the submit button, you will get the blocked message. After it's blocked, it permanently flags selenium. If you load normal chrome, it won't show the block error and let's you do things as intended.

So if you want to get blocked just load the page, instantly type garbage in both text fields and submit.

Clearly it has a way of knowing it's selenium.

Yes, I am using the latest patch. I ensured it before I made sure.

To save you a few seconds

The id's are login-username login-password

du-login-submit //button

I am currently testing it again on a new IP but will fill it with javascript instead. It has already been tested with manual input (it works fine). It just knows it's an automated input.

perhaps, it detects if you have focus to the browser or not?

EDIT: I just manually typed it again and it blocked me. Weird.

czoins commented 3 years ago

I tested it using both send_keys() & click() and javascript, and wasn't blocked when the webdriver was on focus. The moment i ran it in background, i got a block the next time. I am unable to test my idea as i need a new IP, but you could try executing this script before doing anything on the page: driver.execute_script("window.onblur = function() { window.onfocus() }") If it worked, it should be executed on every new page.

ultrafunkamsterdam commented 3 years ago

They're probably layering elements on top each other and checking if the top element is clicked or mouseover, or they utilize shadowdom? Comparing requests would be a first step. And getting a vpn for a few bucks 😏 - sent from mobile

sezonis commented 3 years ago

I tested it using both send_keys() & click() and javascript, and wasn't blocked when the webdriver was on focus. The moment i ran it in background, i got a block the next time. I am unable to test my idea as i need a new IP, but you could try executing this script before doing anything on the page: driver.execute_script("window.onblur = function() { window.onfocus() }") If it worked, it should be executed on every new page.

Worked 1-2 times then banned again.

I don't think it matters though, as it unblocks when you load normal chrome. There must be some sort of signature they are looking for, to determine if it's selenium or not. You can use any legit browser and it unblocks it. Indicating to me that they know it is selenium.

I spoofed everything in navigator to make it look 1:1 to the original browser, it still detected it. All requests/headers are identical.

The only thing I can think of is the web driver executes javascript differently and fails at a certain point. Or can javascript detect processes/port numbers? Must be something very distinct if they accurately block people out with selenium only.

You can do it in the background with normal chrome perfectly fine using tampermonkey.

sezonis commented 3 years ago

They're probably layering elements on top each other and checking if the top element is clicked or mouseover, or they utilize shadowdom? Comparing requests would be a first step. And getting a vpn for a few bucks 😏 - sent from mobile

Refer to my previous reply. I've tested it with another person and he said he gets banned sometime even when doing it manually. I think it detects selenium and just decides to block you after a few times, just because it can

ultrafunkamsterdam commented 3 years ago

What are you considering manually ? Manually using chromedriver initialized Chrome window, or using default chrome installation with user profile and so on?

Furthermore, this might be worth investigating : https://secure.runescape.com/Criciousand-meth-shake-Exit-be-till-in-ches-Shad

ultrafunkamsterdam commented 3 years ago

Old trick, yet often working since site owners are afraid to death to ban anything google related: pretend to be google (they are phasing out browser user-agents completely so make use of it while you still can)

( Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z‡ Safari/537.36 , W.X.Y.Z is actually a placeholder that represents the version of the Chrome browser used by that user agent)

zaksmithcomputing commented 3 years ago

What are you considering manually ? Manually using chromedriver initialized Chrome window, or using default chrome installation with user profile and so on?

Furthermore, this might be worth investigating : https://secure.runescape.com/Criciousand-meth-shake-Exit-be-till-in-ches-Shad

By manually I meant that I can load a blank chromedriver and do the process manually and still get blocked. I've tried a lot of things and still have no answer. I tried rotating user agents, rotating residential proxies and I have checked my navigator.webdriver is set to null. I can tell you the only way I have limited success is when loading the page on a chromedriver manually and then continuing the process manually. Just for some context this page is only accessed after a lengthy process which first involves logging into amazon and verifying an e-mail and then clicking a link which gets you to this page.

sezonis commented 3 years ago

Old trick, yet often working since site owners are afraid to death to ban anything google related: pretend to be google (they are phasing out browser user-agents completely so make use of it while you still can)

( Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z‡ Safari/537.36 , W.X.Y.Z is actually a placeholder that represents the version of the Chrome browser used by that user agent)

User agents don't work. I'm not sure how they do it, but it certainly is a test for all other sites. I tried to read and decode obfuscated javascript code, I got a bit of progress but it's just too much to deal with atm. However, I'm not sure the JS even matters. It might or might not.

It may be better to know the limitations of what javascript can/can't see. There is something very distinct in the selenium driver that sparks the attention of Incapsula. You can do automated inputs in the normal browsers (even manually send a raw request, with all the cookies) and it will still let you through.

The weird thing though is that, even before the javascript executes (I blocked the URL), it blocks you. I have no idea how it knows because when I compared the requests they were identical. The only difference was, one of the responses replied back with the cookies while the other did not. Don't know how it's possible when the javascript didn't even execute.

I've attached a screenshot to show this

image

What I noticed was there are a few more CONNECTs in normal loading, compared to webdriver loading. (always 1 with web driver loading, always 2-5 with normal loading).

This is before the javascript executes (the very first web request). The one on the right is the legit browser, one on the left is the web driver

image

Notice how the cookies are given up front on the legit one, but on the web driver it doesn't. I'd have to assume this check is within the handshake itself, in one of those connects. That selenium cannot seem to pickup. This is just my theory though. I know it probably sounds stupid, but I don't get how cookies are assigned in the legit one vs none in the web driver. Now I should note, these cookies DO get set in the web driver, but not until after the javascript executes! (I'd assume they are bogus cookies, that they use to detect if ur a bot or not) and you end up getting blocked after 1-2 tries regardless of what you do.

It's between the initial requests that seem to make the biggest difference. I'm not sure the Javascript file plays a big role in this.

ultrafunkamsterdam commented 3 years ago

In the car now so I'll look at the details later. But such heavily obfuscated js code has something  important to hide. That's for sureOn Nov 27, 2020 00:39, sezonis notifications@github.com wrote:

Old trick, yet often working since site owners are afraid to death to ban anything google related:

pretend to be google (they are phasing out browser user-agents completely so make use of it while you still can)

( Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z‡ Safari/537.36 , W.X.Y.Z is actually a placeholder that represents the version of the Chrome browser used by that user agent)

User agents don't work. I'm not sure how they do it, but it certainly is a test for all other sites. I tried to read and decode obfuscated javascript code, I got a bit of progress but it's just too much to deal with atm. However, I'm not sure the JS even matters. It might or might not.

It may be better to know the limitations of what javascript can/can't see. There is something very distinct in the selenium driver that sparks the attention of Incapsula. You can do automated inputs in the normal browsers (even manually send a raw request, with all the cookies) and it will still let you through.

The weird thing though is that, even before the javascript executes (I blocked the URL), it blocks you. I have no idea how it knows because when I compared the requests they were identical. The only difference was, one of the responses replied back with the cookies while the other did not. Don't know how it's possible when the javascript didn't even execute.

I've attached a screenshot to show this

What I noticed was there are a few more CONNECTs in normal loading, compared to webdriver loading. (always 1 with web driver loading, always 2-5 with normal loading).

This is before the javascript executes (the very first web request). The one on the right is the legit browser, one on the left is the web driver

Notice how the cookies are given up front on the legit one, but on the web driver it doesn't. I'd have to assume this check is within the handshake itself, in one of those connects. That selenium cannot seem to pickup. This is just my theory though. I know it probably sounds stupid, but I don't get how cookies are assigned in the legit one vs none in the web driver. Now I should note, these cookies DO get set in the web driver, but not until after the javascript executes! (I'd assume they are bogus cookies, that they use to detect if ur a bot or not) and you end up getting blocked after 1-2 tries regardless of what you do.

It's between the initial requests that seem to make the biggest difference. I'm not sure the Javascript file plays a big role in this.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

On Nov 27, 2020 00:39, sezonis notifications@github.com wrote:

Old trick, yet often working since site owners are afraid to death to ban anything google related:

pretend to be google (they are phasing out browser user-agents completely so make use of it while you still can)

( Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z‡ Safari/537.36 , W.X.Y.Z is actually a placeholder that represents the version of the Chrome browser used by that user agent)

User agents don't work. I'm not sure how they do it, but it certainly is a test for all other sites. I tried to read and decode obfuscated javascript code, I got a bit of progress but it's just too much to deal with atm. However, I'm not sure the JS even matters. It might or might not.

It may be better to know the limitations of what javascript can/can't see. There is something very distinct in the selenium driver that sparks the attention of Incapsula. You can do automated inputs in the normal browsers (even manually send a raw request, with all the cookies) and it will still let you through.

The weird thing though is that, even before the javascript executes (I blocked the URL), it blocks you. I have no idea how it knows because when I compared the requests they were identical. The only difference was, one of the responses replied back with the cookies while the other did not. Don't know how it's possible when the javascript didn't even execute.

I've attached a screenshot to show this

What I noticed was there are a few more CONNECTs in normal loading, compared to webdriver loading. (always 1 with web driver loading, always 2-5 with normal loading).

This is before the javascript executes (the very first web request). The one on the right is the legit browser, one on the left is the web driver

Notice how the cookies are given up front on the legit one, but on the web driver it doesn't. I'd have to assume this check is within the handshake itself, in one of those connects. That selenium cannot seem to pickup. This is just my theory though. I know it probably sounds stupid, but I don't get how cookies are assigned in the legit one vs none in the web driver. Now I should note, these cookies DO get set in the web driver, but not until after the javascript executes! (I'd assume they are bogus cookies, that they use to detect if ur a bot or not) and you end up getting blocked after 1-2 tries regardless of what you do.

It's between the initial requests that seem to make the biggest difference. I'm not sure the Javascript file plays a big role in this.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

On Nov 27, 2020 00:39, sezonis notifications@github.com wrote:

Old trick, yet often working since site owners are afraid to death to ban anything google related:

pretend to be google (they are phasing out browser user-agents completely so make use of it while you still can)

( Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z‡ Safari/537.36 , W.X.Y.Z is actually a placeholder that represents the version of the Chrome browser used by that user agent)

User agents don't work. I'm not sure how they do it, but it certainly is a test for all other sites. I tried to read and decode obfuscated javascript code, I got a bit of progress but it's just too much to deal with atm. However, I'm not sure the JS even matters. It might or might not.

It may be better to know the limitations of what javascript can/can't see. There is something very distinct in the selenium driver that sparks the attention of Incapsula. You can do automated inputs in the normal browsers (even manually send a raw request, with all the cookies) and it will still let you through.

The weird thing though is that, even before the javascript executes (I blocked the URL), it blocks you. I have no idea how it knows because when I compared the requests they were identical. The only difference was, one of the responses replied back with the cookies while the other did not. Don't know how it's possible when the javascript didn't even execute.

I've attached a screenshot to show this

What I noticed was there are a few more CONNECTs in normal loading, compared to webdriver loading. (always 1 with web driver loading, always 2-5 with normal loading).

This is before the javascript executes (the very first web request). The one on the right is the legit browser, one on the left is the web driver

Notice how the cookies are given up front on the legit one, but on the web driver it doesn't. I'd have to assume this check is within the handshake itself, in one of those connects. That selenium cannot seem to pickup. This is just my theory though. I know it probably sounds stupid, but I don't get how cookies are assigned in the legit one vs none in the web driver. Now I should note, these cookies DO get set in the web driver, but not until after the javascript executes! (I'd assume they are bogus cookies, that they use to detect if ur a bot or not) and you end up getting blocked after 1-2 tries regardless of what you do.

It's between the initial requests that seem to make the biggest difference. I'm not sure the Javascript file plays a big role in this.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

czoins commented 3 years ago

If your script is basically trying multiple user/password combinations and failing, i don't think the block has to do with the webdriver. I have tried inputting garbage several times manually on a regular browser and still get the captcha block after some attempts. Anyways, it seems that it sets cookies after an inital check, and then doesn't check again until you restart your browser (losing the session cookies). What does it check for, i have no idea. But since a regular browser might "last longer" you can use an existing chrome session:

import subprocess
#launch chrome with remote debug, and open your url
subprocess.Popen(['C:/Program Files (x86)/Google/Chrome/Application/chrome.exe', '--remote-debugging-port=8862', 'secure.runescape.com/m=weblogin/loginform?theme=runescape&mod=www&ssl=1&dest=community'])
#wait
chrome_options.add_experimental_option("debuggerAddress","localhost:8862")
driver = uc.Chrome(options=chrome_options)
#your code
driver.find_element_by_id("login-username").send_keys("stuff")
sezonis commented 3 years ago

If your script is basically trying multiple user/password combinations and failing, i don't think the block has to do with the webdriver. I have tried inputting garbage several times manually on a regular browser and still get the captcha block after some attempts. Anyways, it seems that it sets cookies after an inital check, and then doesn't check again until you restart your browser (losing the session cookies). What does it check for, i have no idea. But since a regular browser might "last longer" you can use an existing chrome session:

import subprocess
#launch chrome with remote debug, and open your url
subprocess.Popen(['C:/Program Files (x86)/Google/Chrome/Application/chrome.exe', '--remote-debugging-port=8862', 'secure.runescape.com/m=weblogin/loginform?theme=runescape&mod=www&ssl=1&dest=community'])
#wait
chrome_options.add_experimental_option("debuggerAddress","localhost:8862")
driver = uc.Chrome(options=chrome_options)
#your code
driver.find_element_by_id("login-username").send_keys("stuff")

The captcha check is fine. I can solve it easily with AI. It's just the blocking that isn't. I am just using CEF sharp atm. CEF sharp seems to work well, no blocks whatsoever. So clearly there is something that tells the website it is selenium. With CEF sharp, I have the option to multi thread.

czoins commented 3 years ago

Hmm that's odd, i was never blocked using selenium, even spamming it multiple times. All i got was several captcha checks. Maybe it has to do with your IP(s)? But it seems that CEF sharp has resolved the issue.

sezonis commented 3 years ago

Hmm that's odd, i was never blocked using selenium, even spamming it multiple times. All i got was several captcha checks. Maybe it has to do with your IP(s)? But it seems that CEF sharp has resolved the issue.

It is not IP based. It is a detection. I say this because if you use the same IP with another browser (legit one), it unblocks it. I'm not sure how you didn't get blocked, that's some luck you have!

Yea, CEF solves it. The API isn't as good as selenium, but it will have to do (for now).

sezonis commented 3 years ago

Update: I knew I wasn't crazy.

CEF works well but gets blocked after a while. I was wondering why, but there seems to be tiers of protection.

Protection #1: (before any JS executes) it checks to see if the language set on the browser, corresponds to your language locale. I'm not sure how it does it, but the second I changed it from "en-US" to "en-NZ (my locale is nz)" it let me in and responded with a 200 OK, rather than a 403.

User agent also plays a role, if you try google bot you get instantly blocked.

Then, there is some sort of protection inside the JS that determines whether or not your input is legit. On first block, you can just clear the cookies and it lets you in again. Second protection: No idea, still trying to figure this one out.

There seems to be some sort of input validation. It doesn't matter about the typing speed (Filled in the forms automatically and submitted it by pressing login after getting focus first). Worked like a charm.

ultrafunkamsterdam commented 3 years ago

Don't forget about wasm as well.. did not check it for this particular site, but seen it alot,

ultrafunkamsterdam commented 3 years ago

Update: I knew I wasn't crazy.

CEF works well but gets blocked after a while. I was wondering why, but there seems to be tiers of protection.

Protection #1: (before any JS executes) it checks to see if the language set on the browser, corresponds to your language locale. I'm not sure how it does it, but the second I changed it from "en-US" to "en-NZ (my locale is nz)" it let me in and responded with a 200 OK, rather than a 403.

User agent also plays a role, if you try google bot you get instantly blocked.

Then, there is some sort of protection inside the JS that determines whether or not your input is legit. On first block, you can just clear the cookies and it lets you in again. Second protection: No idea, still trying to figure this one out.

There seems to be some sort of input validation. It doesn't matter about the typing speed (Filled in the forms automatically and submitted it by pressing login after getting focus first). Worked like a charm.

Try evaluating chrome.app . It gives different results in different situations. Also the cdc asdjflasutopfhvcZLmcfl_Array /Promise variables are still exposed in window (well, they are called differently when patched, but with some basic counting or regex they "could" still be detected)