ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.07k stars 1.09k forks source link

Using UC (headless) on Ubuntu Server #516

Open victornoleto opened 2 years ago

victornoleto commented 2 years ago

I developed a script to get data from a website. The basics of web scraping. The idea was to get this script running 24/7 on a dedicated server (specifically an ubuntu server 20.04 instance on aws lightsale).

In my development environment I tested the script running WITH and WITHOUT headless mode. It was working wonders. To my (sad) surprise I ran into a problem when I went up to the server: the site was detecting the bot and blocking me - which is pretty annoying because this block lasts 1 day and I can't test again until the next day.

I've tried everything. Same. I studied some solutions using random proxies but it was in vain. The site also detects me.

Googling I see that the headless mode doesn't exactly work 100% as it should and so my guess is that the problem might be here. I was curious to know how a server - without graphical interface - would work running a selenium script with any webdriver that is not in headless mode. My curiosity was quenched with the answer "this doesn't work". At least it didn't work for me.

When I run the script, the code freezes for a few minutes in the "get" method, until then it throws the following exception:

selenium.common.exceptions.WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:40949

Both google-chrome and chromedriver binaries are installed and located in /usr/bin with their respective versions: ChromeDriver 97.0.4692.71 and Google Chrome 98.0.4758.102.

Below is the (short) code of the settings I use to instantiate the driver:

import undetected_chromedriver as uc

options = uc.ChromeOptions()

user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'

options.add_argument(f'user-agent={user_agent}')

options.add_argument('--user-data-dir=/tmp/profile')

options.add_argument('--no-first-run --no-service-autorun --password-store=basic')

options.add_argument("window-size=1280,720")

driver = uc.Chrome(
    options = options,
)

So, I ask for your help. I really don't know what else to do and what to look for. For starters, I don't even know if what I want is possible - but I really hope that there is light at the end of this tunnel.

For those who were curious, the site in question that I am trying to access to extract the data is Bet365.

jonardcaguioa commented 2 years ago

same issue. pls share if you have figured this out. I did trial and error. seems that the error is exclusive in Linux and in non-headless mode.

victornoleto commented 2 years ago

I know the problem is associated with non-headless mode but the fact that it's just me Linux makes me intrigued.

It's good to know that I'm not alone in this fight, but unfortunately my post didn't have any other answer besides yours.

jonardcaguioa commented 2 years ago

Yep. Im using linux too. undetected_chromedriver works perfectly fine in windows. This issue has been bugging me these past days.

spacekomet commented 2 years ago

Did you already tried with xvfb virtual display? https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/217#issuecomment-873541042 With it it's possible to run uc without X environment, however there is one drawback, it's slower as the true headless mode.