Open Daves17 opened 7 months ago
Updating just the user-agent isn't the only way to detect the actual platform of the browser.
Consider utilising 'sec' overrides.
sec_ch_ua = '"Examplary Browser"; v="73", ";Not?A.Brand"; v="27"' # for example
options = Options()
options.add_argument(f'--sec-ch-ua={sec_ch_ua}')
In a datacenter (what AWS is), you are detected per definition
Thank you for your answers! I am still wondering what possibilities there are for the server not to be detected. I already use proxies.
Who is detecting you? They may detect your IP. They be doing TLS fingerprint which is happening more and more
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",
if this is your UA, then of course they are going to detect you, you are telling them you are using headless chrome
In general, you are right. But what surprises me is that I'm not detected with my computer, even though I use the same configs and proxies. My computer for comparison:
"request": {
"headers": {
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/123.0.0.0 Safari/537.36",
"sec-ch-ua": "\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\""
}
My assumption is therefore that detecting Linux is the problem.
There's a thousand things they'd be flagging you for just Linux.
As @ usr741852 said, there's probably some level of fingerprinting at play, and if you're using a more typical base EC2 machine, that'd be the easiest to encounter blocks on.
Easy to test though, just spin up an instance with Windows on.
Sounds like a good idea. I'll let you know as soon as I have the results
I have now run the bot on an AWS Windows server. Unfortunately, I had the same experience, although I started it once with and once without headless. It probably has nothing to do with the operating system, but with the fact that these are EC2 instances. Is there a way to bypass this?
To better understand the problem, here is some context: The bot scrapes the data of the particular matches. When I start the bot with my computer, it finds them all. If I start it with the server, the website returns a limited offer of all matches. So, I'm not blocked but only limited because the bot is detected.
I am encountering an issue where my AWS EC2 Linux server is still being detected as such by web services, despite my efforts to mask its identity using Undetected Chromedriver. Despite setting custom User-Agent headers and other techniques to mimic a Windows client, the server is recognized as Linux, which impacts my testing and automation tasks.
Expected behavior: The web service should not be able to detect that the browser is being run from a Linux server on AWS EC2. It should identify it as a Windows client based on the manipulated headers.
Actual behavior: Despite the header manipulation, the server's Linux OS is detected. Inspection of the network requests reveals that some headers, particularly
sec-ch-ua-platform
, still explicitly mention Linux, which might be contributing to the detection:Questions: