ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
10.03k stars 1.16k forks source link

AWS EC2 Linux Server Detected Despite Manipulated Headers Using Undetected Chromedriver #1839

Open Daves17 opened 7 months ago

Daves17 commented 7 months ago

I am encountering an issue where my AWS EC2 Linux server is still being detected as such by web services, despite my efforts to mask its identity using Undetected Chromedriver. Despite setting custom User-Agent headers and other techniques to mimic a Windows client, the server is recognized as Linux, which impacts my testing and automation tasks.

Expected behavior: The web service should not be able to detect that the browser is being run from a Linux server on AWS EC2. It should identify it as a Windows client based on the manipulated headers.

Actual behavior: Despite the header manipulation, the server's Linux OS is detected. Inspection of the network requests reveals that some headers, particularly sec-ch-ua-platform, still explicitly mention Linux, which might be contributing to the detection:

"headers": {
    "Referer": "https://22bets.me/",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",
    "sec-ch-ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Linux\""
}

Questions:

b-nnett commented 7 months ago

Updating just the user-agent isn't the only way to detect the actual platform of the browser.

Consider utilising 'sec' overrides.

sec_ch_ua = '"Examplary Browser"; v="73", ";Not?A.Brand"; v="27"' # for example

options = Options()
options.add_argument(f'--sec-ch-ua={sec_ch_ua}')
ultrafunkamsterdam commented 7 months ago

In a datacenter (what AWS is), you are detected per definition

Daves17 commented 7 months ago

Thank you for your answers! I am still wondering what possibilities there are for the server not to be detected. I already use proxies.

bluemangofunk commented 7 months ago

Who is detecting you? They may detect your IP. They be doing TLS fingerprint which is happening more and more

"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",

if this is your UA, then of course they are going to detect you, you are telling them you are using headless chrome

Daves17 commented 6 months ago

In general, you are right. But what surprises me is that I'm not detected with my computer, even though I use the same configs and proxies. My computer for comparison:

"request": {
            "headers": {
                "Upgrade-Insecure-Requests": "1",
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",
                "sec-ch-ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
                "sec-ch-ua-mobile": "?0",
                "sec-ch-ua-platform": "\"Windows\""
            }

My assumption is therefore that detecting Linux is the problem.

b-nnett commented 6 months ago

There's a thousand things they'd be flagging you for just Linux.

As @ usr741852 said, there's probably some level of fingerprinting at play, and if you're using a more typical base EC2 machine, that'd be the easiest to encounter blocks on.

Easy to test though, just spin up an instance with Windows on.

Daves17 commented 6 months ago

Sounds like a good idea. I'll let you know as soon as I have the results

Daves17 commented 6 months ago

I have now run the bot on an AWS Windows server. Unfortunately, I had the same experience, although I started it once with and once without headless. It probably has nothing to do with the operating system, but with the fact that these are EC2 instances. Is there a way to bypass this?

Daves17 commented 6 months ago

To better understand the problem, here is some context: The bot scrapes the data of the particular matches. When I start the bot with my computer, it finds them all. If I start it with the server, the website returns a limited offer of all matches. So, I'm not blocked but only limited because the bot is detected.