ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
10.14k stars 1.17k forks source link

Cant use it on colab #108

Closed UlkuTuncerKucuktas closed 3 years ago

UlkuTuncerKucuktas commented 3 years ago

  !pip install selenium
  !apt-get update # to update ubuntu to correctly run apt install
  !apt install chromium-chromedriver
  !cp /usr/lib/chromium-browser/chromedriver /usr/bin
  import sys
  sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
  !pip install undetected-chromedriver
  from selenium import webdriver
  from selenium.webdriver import Chrome
  import undetected_chromedriver as uc
  from selenium.webdriver.chrome.options import Options
  import pandas as pd
  from datetime import date
  from tqdm import tqdm

  options = Options()
  options.add_argument('--headless')
  options.add_argument('--no-sandbox')
  options.add_argument('--disable-dev-shm-usage')
  driver = uc.Chrome(options=options)

this code gives

WebDriverException: Message: Service ./chromedriver unexpectedly exited. Status code was: -6

ghost commented 3 years ago

While Trying to Fix it, i have found another error. 315

ultrafunkamsterdam commented 3 years ago

Curious how How you run chrome from collab..

pmb2 commented 3 years ago

Was a solution for this found?

FloatingMind12 commented 2 years ago

I know it's a bit late but this is how to solve this

First paste the code below in a cell then run it (Note that this is meant to be run on Google Colaboratory)

!apt-get update
!apt install chromium-chromedriver
!apt install -y xvfb

!pip install undetected-chromedriver
!pip install PyVirtualDisplay

Here we install all the necessary package to run the webdriver Actually, neither Firefox nor Chrome works in Colab Something to do with how memory in Colab is handled since we get an error like Attempt to free invalid pointer 0xa3af288aa20 when we try to execute them So instead we are going to use Chromium which for some reason doesn't have this restriction

And since the default chromedriver is built from chrome, it will throw the same error as well. And as you guessed it, the one built from chromium will run just as fine By default, undetected-chromedriver pulls the latest version of chromedriver from the official server. Instead we are going to feed it with the one that came preinstalled with chromium-chromedriver by replacing a line of code in the patcher.py file from this package

To do that, run the code below

!zip -j /content/chromedriver_linux64.zip /usr/bin/chromedriver
#replace python3.7 with your own version of python in case it's not the same
patcher_src = "/usr/local/lib/python3.7/dist-packages/undetected_chromedriver/patcher.py"
with open(patcher_src, "r") as f:
    contents = f.read()
    contents = contents.replace("return urlretrieve(u)[0]",\
                     "return urlretrieve('file:///content/chromedriver_linux64.zip',"\
                     "filename='/tmp/chromedriver_linux64.zip')[0]")
with open(patcher_src, "w") as f:
    f.write(contents)

Then just initialize the webdriver just as you would do in the normal way

import undetected_chromedriver.v2 as uc
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 600))
display.start()

options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
driver = uc.Chrome(options=options)

Make sure to add the --no-sandbox argument. Colab runs everything as root so the webdriver would just crash upon launch if it's not there

Creating a virtual display is not mandatory but an headless browser has an higher chance of getting caught by antibots detection so might as well include it

chromium-browser hasn't been updated for a while but here are the links to see the releases and source codes https://launchpad.net/ubuntu/bionic/+package/chromium-browser https://launchpad.net/ubuntu/bionic/+package/chromium-chromedriver

EDIT: if you already used undetected-chromedriver in your code before you ran the fix above, It wouldn't work unless you restart the runtime (Ctrl + M .) Python caches packages so any modifications made to them would just be discarded if they were already referenced before

Now we can properly say that this issue is closed ;)

DiTo97 commented 2 years ago

Please, see the updated version of the code, as this is broken since the transition of Google Colab devices to Ubuntu 20.04

ThreadedLinx commented 1 year ago
driver = uc.Chrome(options=options)
driver.get(URL)

@DiTo97 Thank you so much for this. However I keep getting an error point to this line that reads "Bad Zip File"

Do you know how to fix?

maiiabocharova commented 1 year ago

FloatingMind12 Can you please help - your solution https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377 worked perfectly previously, but now started causing the Message: Service /root/.local/share/undetected_chromedriver/d4f76b1529445788_chromedriver unexpectedly exited. Status code was: -6

Maybe you knows how to fix?

ali-arjmandi commented 1 year ago

same issue

maiiabocharova commented 1 year ago

same issue

Do not know if it helps, but with Selenium there was a solution proposed which works

And example with a colab notebook which is working

Hope someone will also suggest how to adapt it for undetected_chromedriver

DiTo97 commented 1 year ago

Hi @ThreadedLinx, @maiiabocharova, @ali-arjmandi,

I have put together an updated version of the code following @maiiabocharova's suggestion.

import pathlib
import re
import subprocess
import typing

def is_in_jupyter_notebook() -> bool:
    """It checks whether a Jupyter notebook is being run"""
    try:
        get_ipython
        return True
    except NameError:
        return False

def is_on_gcolab() -> bool:
    """It checks whether a Jupyter notebook is being run on Google Colab"""
    if not is_in_jupyter_notebook():
        return False

    return "google.colab" in str(get_ipython())

def is_ubuntu_20_04() -> bool:
    import lsb_release
    metadata = lsb_release.get_os_release()

    distro  = metadata["ID"].lower()
    release = metadata["RELEASE"]

    return distro == "ubuntu" and release == "20.04"

def setup_ubuntu_20_04() -> None:
    """It sets up a Ubuntu 20.04 container with `chromium-browser`

    For more information, see 
    https://github.com/googlecolab/colabtools/issues/3347#issuecomment-1387453484
    """
    # It adds debian buster
    EOF_debian_buster = """\
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
"""
    !echo "$EOF_debian_buster" > /etc/apt/sources.list.d/debian.list

    # It adds keys
    !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
    !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
    !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A

    !apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
    !apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
    !apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg

    # It adds the debian repo for chromium* packages only
    # Note the double-blank lines between entries
    EOF_chromium_pref = """\
Package: *
Pin: release a=eoan
Pin-Priority: 500

Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300

Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
"""
    !echo "$EOF_chromium_pref" > /etc/apt/preferences.d/chromium.pref

    # It installs the packages
    !apt-get update
    !apt-get install chromium chromium-driver
    !apt-get install -y xvfb

def setup_requirements() -> None:
    PIP_requirements = " ".join([
        "PyVirtualDisplay", # To run a virtual display
        "undetected-chromedriver",
    ])

    !python3 -m pip install --upgrade pip
    !python3 -m pip install --upgrade $PIP_requirements

def get_py_module_path(module: str) -> typing.Optional[pathlib.Path]:
    """It gets the absolute path of a Python module"""
    r = subprocess.run(
        ["pip", "show", module], 
        capture_output=True
    )

    try:
        r.check_returncode()
    except subprocess.CalledProcessError:
        return None

    stdout = r.stdout.decode()

    try:
        RE_abspath = "\nLocation: (?P<abspath>.*)\n"

        matches = re.search(RE_abspath, stdout)
        abspath = matches.group("abspath")
    except AttributeError:
        return None

    dist_packages = pathlib.Path(abspath).resolve()
    return dist_packages / module

def patch_undetected_chromedriver() -> None:
    """It forces `undetected_chromedriver` to run the Chromium webdriver

    For more information, see 
    https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377
    """
    chromedriver_filename = "chromedriver_linux64.zip"

    src_chromedriver_filepath = ROOT / chromedriver_filename
    dst_chromedriver_filepath = pathlib.Path("/tmp") / chromedriver_filename

    !zip -j "$src_chromedriver_filepath" /usr/bin/chromedriver

    PY_module = "undetected_chromedriver"
    module_path = get_py_module_path(PY_module)

    patcher_filepath = module_path / "patcher.py"

    with patcher_filepath.open("rt") as f:
        contents = f.read()

    src = f"'file://{src_chromedriver_filepath}'"
    dst = f"'{dst_chromedriver_filepath}'"

    # It is forced to use the local webdriver
    contents = contents.replace(
        f"return urlretrieve(u)[0]",
        f"return urlretrieve({src}, filename={dst})[0]"
    )

    with patcher_filepath.open("wt") as f:
        f.write(contents)

def setup_container() -> None:
    """It sets up the container which is being run"""
    if is_ubuntu_20_04():
        setup_ubuntu_20_04()

    setup_requirements()
    patch_undetected_chromedriver()

ROOT = pathlib.Path("/content")
anchor = ROOT / "anchor.txt"

assert is_on_gcolab(), "It seems you are not on Google Colab"

# It will set the Google Colab container up only
# after disconnections, not after restarts
if not anchor.exists():
    setup_container()
    anchor.touch()

After running the above cell, you may try it out on a Cloudflare-protected website:

import time

import pyvirtualdisplay
import undetected_chromedriver.v2 as uc  # Note import before selenium
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

with pyvirtualdisplay.Display(visible=0, size=(800, 600)) as _:
    URL = "https://nowsecure.nl"  # A Cloudflare-protected website

    options = uc.ChromeOptions()
    options.add_argument("--no-sandbox")

    driver = uc.Chrome(options=options)
    driver.get(URL)

    STR_message = "oh yeah, you passed!"

    try:
        element = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, "//main/h1"))
        )

        message = element.text.strip().lower()
        assert STR_message == message
    finally:
        driver.quit()

Please, mention this issue comment, if you plan to paste it elsewhere!

enok commented 1 year ago

Hi @DiTo97 I got this error on Google Colab:


WebDriverException Traceback (most recent call last) in 14 options.add_argument("--no-sandbox") 15 ---> 16 driver = uc.Chrome(options=options) 17 driver.get(URL) 18

4 frames /usr/local/lib/python3.8/dist-packages/selenium/webdriver/common/service.py in assert_process_still_running(self) 115 return_code = self.process.poll() 116 if return_code: --> 117 raise WebDriverException(f"Service {self.path} unexpectedly exited. Status code was: {return_code}") 118 119 def is_connectable(self) -> bool:

WebDriverException: Message: Service /root/.local/share/undetected_chromedriver/350167f31e3b62c9_chromedriver unexpectedly exited. Status code was: 1

DiTo97 commented 1 year ago

Hi @enok,

I have just run my code and it works. What I can suggest is 1) to disconnect and delete the Google Colab runtime (start over), 2) to make sure it is running on Ubuntu 20.04 (you can use the provided function is_ubuntu_20_04, even though all Google Colab instances should run on that release by default nowadays, 3) to paste the two code snippets above (the setup and the Cloudflare-protected website example) in two different cells, while making sure to run the setup before importing the libraries (undetected_chromedriver, selenium, etc.)

Please, consider upvoting the previous comment, if this helps you solve your problems with Google Colab!

enok commented 1 year ago

Thanks for the reply @DiTo97, can you share your collab project working so I can use it as a template?

DiTo97 commented 1 year ago

Thanks for the reply @DiTo97, can you share your collab project working so I can use it as a template?

Unfortunately, I cannot upload the IPYNB file, as that file type is not supported (nor I can share the Google Colab link).

I will paste below the (JSON) contents of the IPYNB file, even though they are exactly the same I shared above. You just have to copy-paste them to a blank file, save the file with the .ipynb extension, and open it on Google Colab.

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "[#108](https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108)"
      ],
      "metadata": {
        "id": "soj_LuU2J3uu"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pathlib\n",
        "import re\n",
        "import subprocess\n",
        "import typing\n",
        "\n",
        "\n",
        "def is_in_jupyter_notebook() -> bool:\n",
        "    \"\"\"It checks whether a Jupyter notebook is being run\"\"\"\n",
        "    try:\n",
        "        get_ipython\n",
        "        return True\n",
        "    except NameError:\n",
        "        return False\n",
        "\n",
        "\n",
        "def is_on_gcolab() -> bool:\n",
        "    \"\"\"It checks whether a Jupyter notebook is being run on Google Colab\"\"\"\n",
        "    if not is_in_jupyter_notebook():\n",
        "        return False\n",
        "\n",
        "    return \"google.colab\" in str(get_ipython())\n",
        "\n",
        "\n",
        "def is_ubuntu_20_04() -> bool:\n",
        "    import lsb_release\n",
        "    metadata = lsb_release.get_os_release()\n",
        "\n",
        "    distro  = metadata[\"ID\"].lower()\n",
        "    release = metadata[\"RELEASE\"]\n",
        "\n",
        "    return distro == \"ubuntu\" and release == \"20.04\"\n",
        "\n",
        "\n",
        "def setup_ubuntu_20_04() -> None:\n",
        "    \"\"\"It sets up a Ubuntu 20.04 container with `chromium-browser`\n",
        "\n",
        "    For more information, see \n",
        "    https://github.com/googlecolab/colabtools/issues/3347#issuecomment-1387453484\n",
        "    \"\"\"\n",
        "    # It adds debian buster\n",
        "    EOF_debian_buster = \"\"\"\\\n",
        "deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main\n",
        "deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main\n",
        "deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main\n",
        "\"\"\"\n",
        "    !echo \"$EOF_debian_buster\" > /etc/apt/sources.list.d/debian.list\n",
        "\n",
        "    # It adds keys\n",
        "    !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517\n",
        "    !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138\n",
        "    !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A\n",
        "\n",
        "    !apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg\n",
        "    !apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg\n",
        "    !apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg\n",
        "\n",
        "    # It adds the debian repo for chromium* packages only\n",
        "    # Note the double-blank lines between entries\n",
        "    EOF_chromium_pref = \"\"\"\\\n",
        "Package: *\n",
        "Pin: release a=eoan\n",
        "Pin-Priority: 500\n",
        "\n",
        "\n",
        "Package: *\n",
        "Pin: origin \"deb.debian.org\"\n",
        "Pin-Priority: 300\n",
        "\n",
        "\n",
        "Package: chromium*\n",
        "Pin: origin \"deb.debian.org\"\n",
        "Pin-Priority: 700\n",
        "\"\"\"\n",
        "    !echo \"$EOF_chromium_pref\" > /etc/apt/preferences.d/chromium.pref\n",
        "\n",
        "    # It installs the packages\n",
        "    !apt-get update\n",
        "    !apt-get install chromium chromium-driver\n",
        "    !apt-get install -y xvfb\n",
        "\n",
        "\n",
        "def setup_requirements() -> None:\n",
        "    PIP_requirements = \" \".join([\n",
        "        \"PyVirtualDisplay\", # To run a virtual display\n",
        "        \"undetected-chromedriver\",\n",
        "    ])\n",
        "\n",
        "    !python3 -m pip install --upgrade pip\n",
        "    !python3 -m pip install --upgrade $PIP_requirements\n",
        "\n",
        "\n",
        "def get_py_module_path(module: str) -> typing.Optional[pathlib.Path]:\n",
        "    \"\"\"It gets the absolute path of a Python module\"\"\"\n",
        "    r = subprocess.run(\n",
        "        [\"pip\", \"show\", module], \n",
        "        capture_output=True\n",
        "    )\n",
        "\n",
        "    try:\n",
        "        r.check_returncode()\n",
        "    except subprocess.CalledProcessError:\n",
        "        return None\n",
        "\n",
        "    stdout = r.stdout.decode()\n",
        "\n",
        "    try:\n",
        "        RE_abspath = \"\\nLocation: (?P<abspath>.*)\\n\"\n",
        "\n",
        "        matches = re.search(RE_abspath, stdout)\n",
        "        abspath = matches.group(\"abspath\")\n",
        "    except AttributeError:\n",
        "        return None\n",
        "\n",
        "    dist_packages = pathlib.Path(abspath).resolve()\n",
        "    return dist_packages / module\n",
        "\n",
        "\n",
        "def patch_undetected_chromedriver() -> None:\n",
        "    \"\"\"It forces undetected_chromedriver to run the Chromium webdriver\n",
        "\n",
        "    For more information, see \n",
        "    https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377\n",
        "    \"\"\"\n",
        "    chromedriver_filename = \"chromedriver_linux64.zip\"\n",
        "\n",
        "    src_chromedriver_filepath = ROOT / chromedriver_filename\n",
        "    dst_chromedriver_filepath = pathlib.Path(\"/tmp\") / chromedriver_filename\n",
        "\n",
        "    !zip -j \"$src_chromedriver_filepath\" /usr/bin/chromedriver\n",
        "\n",
        "    PY_module = \"undetected_chromedriver\"\n",
        "    module_path = get_py_module_path(PY_module)\n",
        "\n",
        "    patcher_filepath = module_path / \"patcher.py\"\n",
        "\n",
        "    with patcher_filepath.open(\"rt\") as f:\n",
        "        contents = f.read()\n",
        "\n",
        "    src = f\"'file://{src_chromedriver_filepath}'\"\n",
        "    dst = f\"'{dst_chromedriver_filepath}'\"\n",
        "\n",
        "    # It is forced to use the local webdriver\n",
        "    contents = contents.replace(\n",
        "        f\"return urlretrieve(u)[0]\",\n",
        "        f\"return urlretrieve({src}, filename={dst})[0]\"\n",
        "    )\n",
        "\n",
        "    with patcher_filepath.open(\"wt\") as f:\n",
        "        f.write(contents)\n",
        "\n",
        "\n",
        "def setup_container() -> None:\n",
        "    \"\"\"It sets up the container which is being run\"\"\"\n",
        "    if is_ubuntu_20_04():\n",
        "        setup_ubuntu_20_04()\n",
        "\n",
        "    setup_requirements()\n",
        "    patch_undetected_chromedriver()\n",
        "\n",
        "\n",
        "ROOT = pathlib.Path(\"/content\")\n",
        "anchor = ROOT / \"anchor.txt\"\n",
        "\n",
        "\n",
        "assert is_on_gcolab(), \"It seems you are not on Google Colab\"\n",
        "\n",
        "# It will set the Google Colab container up only\n",
        "# after disconnections, not after restarts\n",
        "if not anchor.exists():\n",
        "    setup_container()\n",
        "    anchor.touch()"
      ],
      "metadata": {
        "id": "l6ORjDeTwZvx"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import time\n",
        "\n",
        "import pyvirtualdisplay\n",
        "import undetected_chromedriver.v2 as uc  # Note import before selenium\n",
        "from selenium.webdriver.common.by import By\n",
        "from selenium.webdriver.support import expected_conditions as EC\n",
        "from selenium.webdriver.support.ui import WebDriverWait\n",
        "\n",
        "\n",
        "with pyvirtualdisplay.Display(visible=0, size=(800, 600)) as _:\n",
        "    URL = \"https://nowsecure.nl\"  # A Cloudflare-protected website\n",
        "\n",
        "    options = uc.ChromeOptions()\n",
        "    options.add_argument(\"--no-sandbox\")\n",
        "\n",
        "    driver = uc.Chrome(options=options)\n",
        "    driver.get(URL)\n",
        "\n",
        "    STR_message = \"oh yeah, you passed!\"\n",
        "\n",
        "    try:\n",
        "        element = WebDriverWait(driver, 10).until(\n",
        "            EC.presence_of_element_located((By.XPATH, \"//main/h1\"))\n",
        "        )\n",
        "\n",
        "        message = element.text.strip().lower()\n",
        "        assert STR_message == message\n",
        "    finally:\n",
        "        driver.quit()"
      ],
      "metadata": {
        "id": "E_picQQnOSAi"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "SzdIGmJ-BQ-n"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}
maiiabocharova commented 1 year ago

@DiTo97 Can you help please? I executed exactly the code you suggested, but it does not work. Can you please take a look? I share a colab notebook with the code you suggested and the error it produces. You can edit it directly if you will be so kind to help us, please!

maiiabocharova commented 1 year ago

Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97

Nothig00 commented 1 year ago

Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97

How to add extinction

qraccess commented 1 year ago

Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97

zip error: Nothing to do! (/content/chromedriver_linux64.zip)

drpurohitvishal commented 1 year ago

Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97

zip error: Nothing to do! (/content/chromedriver_linux64.zip)

This is because the colab linux has changed to 22.04 so the fuction to check for 22.04 becomes false. change all 20.04 to 22.04. Also i had to downgrade the selenium library to 4.5.0 to get it working because of some WebDriver.init() got an unexpected keyword argument 'executable_path' error.