ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.57k stars 1.14k forks source link

Help Running in a Docker Container #191

Closed ousooners2834 closed 3 years ago

ousooners2834 commented 3 years ago

Hi there - thanks for creating this package. It is a really nice piece of work.

Is it possible to run it within a docker container by chance? I have tried a simple one but got an error. If anyone has had any success with this in the past, please let us know.

The current error I got when running my docker container locally - unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.

docker build . -t test-scraper-v1 docker run -it test-scraper-v1

Screen Shot 2021-05-28 at 10 03 07 AM

My Docker File:

FROM python:3.8

COPY . /app
WORKDIR /app

RUN mkdir __logger

# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable

# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/

# set display port to avoid crash
ENV DISPLAY=:99

RUN pip install --upgrade pip

RUN pip install -r requirements.txt

CMD ["python", "./app.py"]

My app.py in the same folder as the docker file -

import undetected_chromedriver as uc

def app() -> None:
    options = uc.ChromeOptions()
    options.headless=True
    options.add_argument('--headless')
    driver = uc.Chrome(options=options)
    driver.get('https://www.similarweb.com/website/ibm.com/')

    html = str(driver.page_source)
    rank_long_string = html.split('"GlobalRank":[')[1]
    similarweb_rank = int(rank_long_string.split(',')[0])
    driver.quit()
    print(similarweb_rank)

if __name__ == "__main__":
    app()

the requirements.txt file -

pandas==1.1.2
prompt_toolkit==1.0.14
regex==2020.11.13
PyInquirer==1.0.3
selenium==3.141.0
undetected_chromedriver==3.0.0
saintanger commented 3 years ago

Did you find a solution to this afterall?

ousooners2834 commented 3 years ago

I did not. I have seen a few comments in the issues that this is not supported so finally gave up. I'm not a full time developer and thus this is pretty advanced for me. In theory it should be possible since the container gets assigned an IP and we can open a virtual browser in headless mode.

I have started to deploy sections of my bots on local machines and push that data up to a Firebase instance which works flawlessly with this driver. Then I integrate that data into another bot that is running in the cloud without needing Selenium - just API calls.

ousooners2834 commented 3 years ago

However, if someone that is skilled in Docker - and understands these chrome drivers could work to create an image, I think it would be really popular. Being able to deploy a server instance of the undetectable chrome driver would make things just so much easier. Eventually I will have to start buying rasberry pi's for cheap local instances. I would rather not do that though.

Anticope12 commented 2 years ago

Hi @ousooners2834 , Following on your issue here, I'm not sure if you have seen this https://hub.docker.com/r/ultrafunk/undetected-chromedriver I'm not sure if it will make any efforts solving your issue :)

Please let me know if it makes any difference :)