seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
4.46k stars 909 forks source link

Slow opening Chrome (1 minute) on Ubuntu headless #2751

Closed fridary closed 2 months ago

fridary commented 2 months ago

OS: Ubuntu 20.04.6 LTS (no GUI) Server: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz, 180Gb RAM python: 3.11.7 seleniumbase: 4.26.3

This simple script opens Chrome too long on Linux headless (no GUI). I need headless, because no GUI. Script:

from seleniumbase import SB
import time

start_time = time.time()
with SB(
    uc=True, # without this error "session not created: DevToolsActivePort file doesn't exist"
    # xvfb=True,
    headless=True) as sb:
    print("%.3fs inited" % (time.time() - start_time))
    start_time = time.time()
    sb.open("https://google.com/")
    print("%.3fs website opened" % (time.time() - start_time))

Result:

61.011s inited
0.998s website opened

If I do the same on Mac OS, it opens within seconds. If I open with pure selenium, it opens within seconds.

When I first installed package, I had this log:

$ python test.py

Warning: uc_driver not found. Getting it now:

*** chromedriver to download = 124.0.6367.91 (Latest Stable)

Downloading chromedriver-linux64.zip from:
https://storage.googleapis.com/chrome-for-testing-public/124.0.6367.91/linux64/chromedriver-linux64.zip ...
Download Complete!

Extracting ['chromedriver'] from chromedriver-linux64.zip ...
Unzip Complete!

The file [uc_driver] was saved to:
/home/root/miniconda3/envs/eth/lib/python3.11/site-packages/seleniumbase/drivers/uc_driver

Making [uc_driver 124.0.6367.91] executable ...
[uc_driver 124.0.6367.91] is now ready for use!

*** chromedriver to download = 124.0.6367.91 (Latest Stable)

Downloading chromedriver-linux64.zip from:
https://storage.googleapis.com/chrome-for-testing-public/124.0.6367.91/linux64/chromedriver-linux64.zip ...
Download Complete!

Extracting ['chromedriver'] from chromedriver-linux64.zip ...
Unzip Complete!

The file [chromedriver] was saved to:
/home/root/miniconda3/envs/eth/lib/python3.11/site-packages/seleniumbase/drivers/chromedriver

Making [chromedriver 124.0.6367.91] executable ...
[chromedriver 124.0.6367.91] is now ready for use!

update: how to add path to chromium driver? I can not find in API. On selenium it's service = Service(executable_path='/usr/lib/chromium-browser/chromedriver')

mdmintz commented 2 months ago

1: You can use xvfb=True so that you don't need to use headless mode on a headless Linux machine.

  1. The first time you run a script, SeleniumBase needs to download chromedriver and/or uc_driver.

It appears your Linux machine has a slow Internet connection, which would explain the slow download.

fridary commented 2 months ago

@mdmintz If I do with xvfb=True and remove headless=True

start_time = time.time()
with SB(
    uc=True, # without this error "session not created: DevToolsActivePort file doesn't exist"
    xvfb=True,
    ) as sb:
    print("%.3fs inited" % (time.time() - start_time))

I get error:

selenium.common.exceptions.WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:9222
from chrome not reachable
Stacktrace:
#0 0x55c9b5b50cb3 <unknown>
#1 0x55c9b583f2f7 <unknown>
...
#16 0x55c9b5b4fe04 <unknown>
#17 0x7fea6098a609 start_thread

If I add only xvfb=True, I get the same time 60.976s inited Speed is 100% fast and 1 Gb/s. But speed does not matter, because I only init chrome webdriver object. My speed test shows website was loaded in 0.9 second. And as I mentioned, with native selenium everything (including loading chrome webdriver object) loads within seconds.

Maybe I should manually set path to chromium driver? How to do that? I can not find in API. On selenium it's service = Service(executable_path='/usr/lib/chromium-browser/chromedriver')

mdmintz commented 2 months ago

How long does the second test take? The first one will download any missing drivers first, making it take longer.

Also, see if anything changes with timing when not using UC Mode.

fridary commented 1 month ago

@mdmintz I wrote... Second test took 60.976s. Do you want to tell it downloads chrome webdriver everytime? If so, I could see logs about it like I saw first time when I first time executed python seleniumbase lib, but there was nothing. If I remove UC mode, I also pointed that there will be an error:

(session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/chromium-browser is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Maybe I can print extra debug logs somehow? How can I specify custom chrome webdriver path (like executable_path='/usr/lib/chromium-browser/chromedriver')? Because if I execute this, there is an error:

$ /usr/bin/chromium-browser any-text
[153024:153024:0506/055439.814261:ERROR:ozone_platform_x11.cc(243)] Missing X server or $DISPLAY
[153024:153024:0506/055439.814319:ERROR:env.cc(258)] The platform failed to initialize.  Exiting.

But if I change location, everything works (perhaps I have 2 different chrome webdrivers and only 1 works):

$ /usr/lib/chromium-browser/chromedriver any-text
Starting ChromeDriver 124.0.6367.118 (3af29d4696f1a61061b55222fcdf0c57bdc32475-refs/branch-heads/6367@{#1036}) on port 9515
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
mdmintz commented 1 month ago

You can specify the Chromium binary location: https://github.com/seleniumbase/SeleniumBase/blob/419cedc8fba6bfce71add3ec915a92d67698c9a1/seleniumbase/plugins/sb_manager.py#L76

But you can't specify the driver location. SeleniumBase will automatically check the System PATH for it, and if not there, it will download the necessary one to the seleniumbase/drivers/ folder.

I think the issue you have may be that it's using /usr/bin/chromium-browser instead of google-chrome or google-chrome-stable, which are better choices for Chrome on Linux for Selenium compatibility. Looks to be a configuration issue.

It'll be tricky to debug your Linux configuration, so you may have to play around with it yourself. Also, if you're using Docker with UC Mode, note that you won't be able to remain undetected. And if the configuration is correct, xvfb=True should be working.

fridary commented 1 month ago

@mdmintz can you please help with PATH. I did:

$ export PATH=$PATH:/usr/lib/chromium-browser/chromedriver
$ nano ~/.bashrc
   export PATH=$PATH:/usr/lib/chromium-browser/chromedriver
$ source ~/.bashrc
$ exec bash

and this:

import sys
sys.path.append('/usr/lib/chromium-browser/chromedriver')
with SB(..) ...

but anyway getting old path /usr/bin/chromium-browser:

(session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/chromium-browser is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
mdmintz commented 1 month ago

Chrome is the browser. chromedriver is the driver. You mixed the two up because chromium-browser/chromedriver would never exist.

fridary commented 1 month ago

@mdmintz man, I still don't get it what to do. Can you help me please? Problem: SB() object loads 60 seconds on Ubuntu no GUI. On Mac OS it loads in 1 second. This code:

from seleniumbase import SB
import time
import sys
start_time = time.time()
with SB(
    uc=True, # without this error "session not created: DevToolsActivePort file doesn't exist"
    headless=True
    ) as sb:
    print("%.3fs inited" % (time.time() - start_time))
    start_time = time.time()
    sb.open("https://google.com/")
    print("%.3fs website opened" % (time.time() - start_time))

Result:

61.305s inited
1.592s website opened

If I remove uc=True param and/or add param xvfb=True, I will get this error:

selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/chromium-browser is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x5622f698ecb3 <unknown>
#1 0x5622f667d4a7 <unknown
...

Now, I test selenium pure (on Ubuntu):

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
import time

start_time = time.time()
service = Service()
options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome(service=service, options=options)
print("%.3fs inited" % (time.time() - start_time))
start_time = time.time()
driver.get('https://www.google.com/')
print("%.3fs website opened" % (time.time() - start_time))
driver.quit()

The result:

0.717s inited
0.965s website opened

=> 100% the problem is in seleniumbase. Can you help me please? You said I can specify driver location in system path, it can help. I wrote a post 5 days ago and whatever I did, the same problem.

mdmintz commented 1 month ago

Looks like you're dealing with https://stackoverflow.com/q/50642308/7058266 There are Python solutions here: https://stackoverflow.com/a/56638103/7058266

With SeleniumBase, you can pass in addition chromium options using chromium_arg: https://github.com/seleniumbase/SeleniumBase/blob/e693775f56d0ad2904112577217d934437715125/seleniumbase/plugins/sb_manager.py#L69

If no combination of changing headless, headless2, or xvfb switches does the trick for you, then you can try a chromium arg from that, eg: SB(chromium_arg="--disable-gpu", uc=True). (Note that chromium_arg takes a comma-separated list of options.)

It's going to be much easier for you to debug than me, since Linux machines are all configured differently.

If you're not using UC Mode, there's a SeleniumBase Dockerfile with instructions here.

Also note that SeleniumBase runs a lot of Linux tests via GitHub Actions: https://github.com/seleniumbase/SeleniumBase/actions (It's probably configured differently from your Linux machine. You'll have to figure out the configuration you need.)