Closed UlkuTuncerKucuktas closed 3 years ago
While Trying to Fix it, i have found another error.
Curious how How you run chrome from collab..
Was a solution for this found?
I know it's a bit late but this is how to solve this
First paste the code below in a cell then run it (Note that this is meant to be run on Google Colaboratory)
!apt-get update
!apt install chromium-chromedriver
!apt install -y xvfb
!pip install undetected-chromedriver
!pip install PyVirtualDisplay
Here we install all the necessary package to run the webdriver
Actually, neither Firefox nor Chrome works in Colab
Something to do with how memory in Colab is handled since we get an error like Attempt to free invalid pointer 0xa3af288aa20
when we try to execute them
So instead we are going to use Chromium which for some reason doesn't have this restriction
And since the default chromedriver is built from chrome, it will throw the same error as well. And as you guessed it, the one built from chromium will run just as fine
By default, undetected-chromedriver
pulls the latest version of chromedriver from the official server. Instead we are going to feed it with the one that came preinstalled with chromium-chromedriver
by replacing a line of code in the patcher.py file from this package
To do that, run the code below
!zip -j /content/chromedriver_linux64.zip /usr/bin/chromedriver
#replace python3.7 with your own version of python in case it's not the same
patcher_src = "/usr/local/lib/python3.7/dist-packages/undetected_chromedriver/patcher.py"
with open(patcher_src, "r") as f:
contents = f.read()
contents = contents.replace("return urlretrieve(u)[0]",\
"return urlretrieve('file:///content/chromedriver_linux64.zip',"\
"filename='/tmp/chromedriver_linux64.zip')[0]")
with open(patcher_src, "w") as f:
f.write(contents)
Then just initialize the webdriver just as you would do in the normal way
import undetected_chromedriver.v2 as uc
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 600))
display.start()
options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
driver = uc.Chrome(options=options)
Make sure to add the --no-sandbox
argument. Colab runs everything as root so the webdriver would just crash upon launch if it's not there
Creating a virtual display is not mandatory but an headless browser has an higher chance of getting caught by antibots detection so might as well include it
chromium-browser
hasn't been updated for a while
but here are the links to see the releases and source codes
https://launchpad.net/ubuntu/bionic/+package/chromium-browser
https://launchpad.net/ubuntu/bionic/+package/chromium-chromedriver
EDIT: if you already used undetected-chromedriver
in your code before you ran the fix above, It wouldn't work unless you restart the runtime (Ctrl + M .)
Python caches packages so any modifications made to them would just be discarded if they were already referenced before
Now we can properly say that this issue is closed ;)
Please, see the updated version of the code, as this is broken since the transition of Google Colab devices to Ubuntu 20.04
driver = uc.Chrome(options=options) driver.get(URL)
@DiTo97 Thank you so much for this. However I keep getting an error point to this line that reads "Bad Zip File"
Do you know how to fix?
FloatingMind12 Can you please help - your solution https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377 worked perfectly previously, but now started causing the
Message: Service /root/.local/share/undetected_chromedriver/d4f76b1529445788_chromedriver unexpectedly exited. Status code was: -6
Maybe you knows how to fix?
same issue
same issue
Do not know if it helps, but with Selenium there was a solution proposed which works
And example with a colab notebook which is working
Hope someone will also suggest how to adapt it for undetected_chromedriver
Hi @ThreadedLinx, @maiiabocharova, @ali-arjmandi,
I have put together an updated version of the code following @maiiabocharova's suggestion.
import pathlib
import re
import subprocess
import typing
def is_in_jupyter_notebook() -> bool:
"""It checks whether a Jupyter notebook is being run"""
try:
get_ipython
return True
except NameError:
return False
def is_on_gcolab() -> bool:
"""It checks whether a Jupyter notebook is being run on Google Colab"""
if not is_in_jupyter_notebook():
return False
return "google.colab" in str(get_ipython())
def is_ubuntu_20_04() -> bool:
import lsb_release
metadata = lsb_release.get_os_release()
distro = metadata["ID"].lower()
release = metadata["RELEASE"]
return distro == "ubuntu" and release == "20.04"
def setup_ubuntu_20_04() -> None:
"""It sets up a Ubuntu 20.04 container with `chromium-browser`
For more information, see
https://github.com/googlecolab/colabtools/issues/3347#issuecomment-1387453484
"""
# It adds debian buster
EOF_debian_buster = """\
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
"""
!echo "$EOF_debian_buster" > /etc/apt/sources.list.d/debian.list
# It adds keys
!apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
!apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
!apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A
!apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
!apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
!apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg
# It adds the debian repo for chromium* packages only
# Note the double-blank lines between entries
EOF_chromium_pref = """\
Package: *
Pin: release a=eoan
Pin-Priority: 500
Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300
Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
"""
!echo "$EOF_chromium_pref" > /etc/apt/preferences.d/chromium.pref
# It installs the packages
!apt-get update
!apt-get install chromium chromium-driver
!apt-get install -y xvfb
def setup_requirements() -> None:
PIP_requirements = " ".join([
"PyVirtualDisplay", # To run a virtual display
"undetected-chromedriver",
])
!python3 -m pip install --upgrade pip
!python3 -m pip install --upgrade $PIP_requirements
def get_py_module_path(module: str) -> typing.Optional[pathlib.Path]:
"""It gets the absolute path of a Python module"""
r = subprocess.run(
["pip", "show", module],
capture_output=True
)
try:
r.check_returncode()
except subprocess.CalledProcessError:
return None
stdout = r.stdout.decode()
try:
RE_abspath = "\nLocation: (?P<abspath>.*)\n"
matches = re.search(RE_abspath, stdout)
abspath = matches.group("abspath")
except AttributeError:
return None
dist_packages = pathlib.Path(abspath).resolve()
return dist_packages / module
def patch_undetected_chromedriver() -> None:
"""It forces `undetected_chromedriver` to run the Chromium webdriver
For more information, see
https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377
"""
chromedriver_filename = "chromedriver_linux64.zip"
src_chromedriver_filepath = ROOT / chromedriver_filename
dst_chromedriver_filepath = pathlib.Path("/tmp") / chromedriver_filename
!zip -j "$src_chromedriver_filepath" /usr/bin/chromedriver
PY_module = "undetected_chromedriver"
module_path = get_py_module_path(PY_module)
patcher_filepath = module_path / "patcher.py"
with patcher_filepath.open("rt") as f:
contents = f.read()
src = f"'file://{src_chromedriver_filepath}'"
dst = f"'{dst_chromedriver_filepath}'"
# It is forced to use the local webdriver
contents = contents.replace(
f"return urlretrieve(u)[0]",
f"return urlretrieve({src}, filename={dst})[0]"
)
with patcher_filepath.open("wt") as f:
f.write(contents)
def setup_container() -> None:
"""It sets up the container which is being run"""
if is_ubuntu_20_04():
setup_ubuntu_20_04()
setup_requirements()
patch_undetected_chromedriver()
ROOT = pathlib.Path("/content")
anchor = ROOT / "anchor.txt"
assert is_on_gcolab(), "It seems you are not on Google Colab"
# It will set the Google Colab container up only
# after disconnections, not after restarts
if not anchor.exists():
setup_container()
anchor.touch()
After running the above cell, you may try it out on a Cloudflare-protected website:
import time
import pyvirtualdisplay
import undetected_chromedriver.v2 as uc # Note import before selenium
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
with pyvirtualdisplay.Display(visible=0, size=(800, 600)) as _:
URL = "https://nowsecure.nl" # A Cloudflare-protected website
options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
driver = uc.Chrome(options=options)
driver.get(URL)
STR_message = "oh yeah, you passed!"
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//main/h1"))
)
message = element.text.strip().lower()
assert STR_message == message
finally:
driver.quit()
Please, mention this issue comment, if you plan to paste it elsewhere!
Hi @DiTo97 I got this error on Google Colab:
WebDriverException Traceback (most recent call last)
4 frames /usr/local/lib/python3.8/dist-packages/selenium/webdriver/common/service.py in assert_process_still_running(self) 115 return_code = self.process.poll() 116 if return_code: --> 117 raise WebDriverException(f"Service {self.path} unexpectedly exited. Status code was: {return_code}") 118 119 def is_connectable(self) -> bool:
WebDriverException: Message: Service /root/.local/share/undetected_chromedriver/350167f31e3b62c9_chromedriver unexpectedly exited. Status code was: 1
Hi @enok,
I have just run my code and it works. What I can suggest is 1) to disconnect and delete the Google Colab runtime (start over), 2) to make sure it is running on Ubuntu 20.04 (you can use the provided function is_ubuntu_20_04
, even though all Google Colab instances should run on that release by default nowadays, 3) to paste the two code snippets above (the setup and the Cloudflare-protected website example) in two different cells, while making sure to run the setup before importing the libraries (undetected_chromedriver
, selenium
, etc.)
Please, consider upvoting the previous comment, if this helps you solve your problems with Google Colab!
Thanks for the reply @DiTo97, can you share your collab project working so I can use it as a template?
Thanks for the reply @DiTo97, can you share your collab project working so I can use it as a template?
Unfortunately, I cannot upload the IPYNB file, as that file type is not supported (nor I can share the Google Colab link).
I will paste below the (JSON) contents of the IPYNB file, even though they are exactly the same I shared above. You just have to copy-paste them to a blank file, save the file with the .ipynb extension, and open it on Google Colab.
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"[#108](https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108)"
],
"metadata": {
"id": "soj_LuU2J3uu"
}
},
{
"cell_type": "code",
"source": [
"import pathlib\n",
"import re\n",
"import subprocess\n",
"import typing\n",
"\n",
"\n",
"def is_in_jupyter_notebook() -> bool:\n",
" \"\"\"It checks whether a Jupyter notebook is being run\"\"\"\n",
" try:\n",
" get_ipython\n",
" return True\n",
" except NameError:\n",
" return False\n",
"\n",
"\n",
"def is_on_gcolab() -> bool:\n",
" \"\"\"It checks whether a Jupyter notebook is being run on Google Colab\"\"\"\n",
" if not is_in_jupyter_notebook():\n",
" return False\n",
"\n",
" return \"google.colab\" in str(get_ipython())\n",
"\n",
"\n",
"def is_ubuntu_20_04() -> bool:\n",
" import lsb_release\n",
" metadata = lsb_release.get_os_release()\n",
"\n",
" distro = metadata[\"ID\"].lower()\n",
" release = metadata[\"RELEASE\"]\n",
"\n",
" return distro == \"ubuntu\" and release == \"20.04\"\n",
"\n",
"\n",
"def setup_ubuntu_20_04() -> None:\n",
" \"\"\"It sets up a Ubuntu 20.04 container with `chromium-browser`\n",
"\n",
" For more information, see \n",
" https://github.com/googlecolab/colabtools/issues/3347#issuecomment-1387453484\n",
" \"\"\"\n",
" # It adds debian buster\n",
" EOF_debian_buster = \"\"\"\\\n",
"deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main\n",
"deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main\n",
"deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main\n",
"\"\"\"\n",
" !echo \"$EOF_debian_buster\" > /etc/apt/sources.list.d/debian.list\n",
"\n",
" # It adds keys\n",
" !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517\n",
" !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138\n",
" !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A\n",
"\n",
" !apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg\n",
" !apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg\n",
" !apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg\n",
"\n",
" # It adds the debian repo for chromium* packages only\n",
" # Note the double-blank lines between entries\n",
" EOF_chromium_pref = \"\"\"\\\n",
"Package: *\n",
"Pin: release a=eoan\n",
"Pin-Priority: 500\n",
"\n",
"\n",
"Package: *\n",
"Pin: origin \"deb.debian.org\"\n",
"Pin-Priority: 300\n",
"\n",
"\n",
"Package: chromium*\n",
"Pin: origin \"deb.debian.org\"\n",
"Pin-Priority: 700\n",
"\"\"\"\n",
" !echo \"$EOF_chromium_pref\" > /etc/apt/preferences.d/chromium.pref\n",
"\n",
" # It installs the packages\n",
" !apt-get update\n",
" !apt-get install chromium chromium-driver\n",
" !apt-get install -y xvfb\n",
"\n",
"\n",
"def setup_requirements() -> None:\n",
" PIP_requirements = \" \".join([\n",
" \"PyVirtualDisplay\", # To run a virtual display\n",
" \"undetected-chromedriver\",\n",
" ])\n",
"\n",
" !python3 -m pip install --upgrade pip\n",
" !python3 -m pip install --upgrade $PIP_requirements\n",
"\n",
"\n",
"def get_py_module_path(module: str) -> typing.Optional[pathlib.Path]:\n",
" \"\"\"It gets the absolute path of a Python module\"\"\"\n",
" r = subprocess.run(\n",
" [\"pip\", \"show\", module], \n",
" capture_output=True\n",
" )\n",
"\n",
" try:\n",
" r.check_returncode()\n",
" except subprocess.CalledProcessError:\n",
" return None\n",
"\n",
" stdout = r.stdout.decode()\n",
"\n",
" try:\n",
" RE_abspath = \"\\nLocation: (?P<abspath>.*)\\n\"\n",
"\n",
" matches = re.search(RE_abspath, stdout)\n",
" abspath = matches.group(\"abspath\")\n",
" except AttributeError:\n",
" return None\n",
"\n",
" dist_packages = pathlib.Path(abspath).resolve()\n",
" return dist_packages / module\n",
"\n",
"\n",
"def patch_undetected_chromedriver() -> None:\n",
" \"\"\"It forces undetected_chromedriver to run the Chromium webdriver\n",
"\n",
" For more information, see \n",
" https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377\n",
" \"\"\"\n",
" chromedriver_filename = \"chromedriver_linux64.zip\"\n",
"\n",
" src_chromedriver_filepath = ROOT / chromedriver_filename\n",
" dst_chromedriver_filepath = pathlib.Path(\"/tmp\") / chromedriver_filename\n",
"\n",
" !zip -j \"$src_chromedriver_filepath\" /usr/bin/chromedriver\n",
"\n",
" PY_module = \"undetected_chromedriver\"\n",
" module_path = get_py_module_path(PY_module)\n",
"\n",
" patcher_filepath = module_path / \"patcher.py\"\n",
"\n",
" with patcher_filepath.open(\"rt\") as f:\n",
" contents = f.read()\n",
"\n",
" src = f\"'file://{src_chromedriver_filepath}'\"\n",
" dst = f\"'{dst_chromedriver_filepath}'\"\n",
"\n",
" # It is forced to use the local webdriver\n",
" contents = contents.replace(\n",
" f\"return urlretrieve(u)[0]\",\n",
" f\"return urlretrieve({src}, filename={dst})[0]\"\n",
" )\n",
"\n",
" with patcher_filepath.open(\"wt\") as f:\n",
" f.write(contents)\n",
"\n",
"\n",
"def setup_container() -> None:\n",
" \"\"\"It sets up the container which is being run\"\"\"\n",
" if is_ubuntu_20_04():\n",
" setup_ubuntu_20_04()\n",
"\n",
" setup_requirements()\n",
" patch_undetected_chromedriver()\n",
"\n",
"\n",
"ROOT = pathlib.Path(\"/content\")\n",
"anchor = ROOT / \"anchor.txt\"\n",
"\n",
"\n",
"assert is_on_gcolab(), \"It seems you are not on Google Colab\"\n",
"\n",
"# It will set the Google Colab container up only\n",
"# after disconnections, not after restarts\n",
"if not anchor.exists():\n",
" setup_container()\n",
" anchor.touch()"
],
"metadata": {
"id": "l6ORjDeTwZvx"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import time\n",
"\n",
"import pyvirtualdisplay\n",
"import undetected_chromedriver.v2 as uc # Note import before selenium\n",
"from selenium.webdriver.common.by import By\n",
"from selenium.webdriver.support import expected_conditions as EC\n",
"from selenium.webdriver.support.ui import WebDriverWait\n",
"\n",
"\n",
"with pyvirtualdisplay.Display(visible=0, size=(800, 600)) as _:\n",
" URL = \"https://nowsecure.nl\" # A Cloudflare-protected website\n",
"\n",
" options = uc.ChromeOptions()\n",
" options.add_argument(\"--no-sandbox\")\n",
"\n",
" driver = uc.Chrome(options=options)\n",
" driver.get(URL)\n",
"\n",
" STR_message = \"oh yeah, you passed!\"\n",
"\n",
" try:\n",
" element = WebDriverWait(driver, 10).until(\n",
" EC.presence_of_element_located((By.XPATH, \"//main/h1\"))\n",
" )\n",
"\n",
" message = element.text.strip().lower()\n",
" assert STR_message == message\n",
" finally:\n",
" driver.quit()"
],
"metadata": {
"id": "E_picQQnOSAi"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "SzdIGmJ-BQ-n"
},
"execution_count": null,
"outputs": []
}
]
}
@DiTo97 Can you help please? I executed exactly the code you suggested, but it does not work. Can you please take a look? I share a colab notebook with the code you suggested and the error it produces. You can edit it directly if you will be so kind to help us, please!
Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97
Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97
How to add extinction
Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97
zip error: Nothing to do! (/content/chromedriver_linux64.zip)
Fixed by setting the undetected_chromedriver==3.2.1 For anyone interested - fully working notebook with reference and credits given to @DiTo97
zip error: Nothing to do! (/content/chromedriver_linux64.zip)
This is because the colab linux has changed to 22.04 so the fuction to check for 22.04 becomes false. change all 20.04 to 22.04. Also i had to downgrade the selenium library to 4.5.0 to get it working because of some WebDriver.init() got an unexpected keyword argument 'executable_path' error.
this code gives
WebDriverException: Message: Service ./chromedriver unexpectedly exited. Status code was: -6