seleniumbase / SeleniumBase

📊 Blazing fast Python framework for web crawling, scraping, testing, and reporting. Supports pytest. Stealth abilities: UC Mode and CDP Mode.
https://seleniumbase.io
MIT License
5.49k stars 993 forks source link

Can't launch in Google Colab #2778

Closed megalevel closed 6 months ago

megalevel commented 6 months ago

Hello!

I am trying to execute the code in the Google Colab environment, but I get an error. My code:

!pip install -U seleniumbase

from seleniumbase import Driver

driver = Driver(uc=True, headless=True)

try:
    driver.open("https:/www.investmint.ru/")
    driver.sleep(10)
finally:
    driver.quit()

print('driver:\n',driver.page_source)

Error:

---------------------------------------------------------------------------
WebDriverException                        Traceback (most recent call last)
[<ipython-input-60-bd9a82e05721>](https://localhost:8080/#) in <cell line: 8>()
      6 # driver = Driver(uc=True, chromium_arg="--no-sandbox,--headless,--disable-gpu")  # Методы driver: https://github.com/seleniumbase/SeleniumBase/issues/2200
      7 #driver = Driver(uc=True, chromium_arg="--no-sandbox,--disable-dev-shm-usage,--disable-application-cache,--disable-setuid-sandbox,--disable-browser-side-navigation,--disable-save-password-bubble,--disable-single-click-autofill,--allow-file-access-from-files,--disable-prompt-on-repost,--dns-prefetch-disable,--disable-translate,--disable-renderer-backgrounding,--disable-backgrounding-occluded-windows,--disable-client-side-phishing-detection,--disable-oopr-debug-crash-dump,--disable-top-sites,--ash-no-nudges,--no-crash-upload,--deny-permission-prompts,--disable-popup-blocking,--homepage=chrome://new-tab-page/,--headless=new,--remote-debugging-host=127.0.0.1,--remote-debugging-port=9222,--user-data-dir=/tmp/tmpemr44m_f,--lang=es-ES,--no-default-browser-check,--no-first-run,--no-service-autorun,--password-store=basic,--log-level=0")
----> 8 driver = Driver(uc=True)
      9 
     10 try:

10 frames
[/usr/local/lib/python3.10/dist-packages/seleniumbase/plugins/driver_manager.py](https://localhost:8080/#) in Driver(browser, headless, headless2, headed, locale_code, protocol, servername, port, proxy, proxy_bypass_list, proxy_pac_url, multi_proxy, agent, cap_file, cap_string, recorder_ext, disable_js, disable_csp, enable_ws, disable_ws, enable_sync, use_auto_ext, undetectable, uc_cdp_events, uc_subprocess, log_cdp_events, no_sandbox, disable_gpu, incognito, guest_mode, dark_mode, devtools, remote_debug, enable_3d_apis, swiftshader, ad_block_on, host_resolver_rules, block_images, do_not_track, chromium_arg, firefox_arg, firefox_pref, user_data_dir, extension_zip, extension_dir, disable_features, binary_location, driver_version, page_load_strategy, use_wire, external_pdf, is_mobile, mobile, d_width, d_height, d_p_r, uc, undetected, uc_cdp, uc_sub, log_cdp, wire, pls)
    527     from seleniumbase.core import browser_launcher
    528 
--> 529     driver = browser_launcher.get_driver(
    530         browser_name=browser_name,
    531         headless=headless,

[/usr/local/lib/python3.10/dist-packages/seleniumbase/core/browser_launcher.py](https://localhost:8080/#) in get_driver(browser_name, headless, locale_code, use_grid, protocol, servername, port, proxy_string, proxy_bypass_list, proxy_pac_url, multi_proxy, user_agent, cap_file, cap_string, recorder_ext, disable_js, disable_csp, enable_ws, enable_sync, use_auto_ext, undetectable, uc_cdp_events, uc_subprocess, log_cdp_events, no_sandbox, disable_gpu, headless2, incognito, guest_mode, dark_mode, devtools, remote_debug, enable_3d_apis, swiftshader, ad_block_on, host_resolver_rules, block_images, do_not_track, chromium_arg, firefox_arg, firefox_pref, user_data_dir, extension_zip, extension_dir, disable_features, binary_location, driver_version, page_load_strategy, use_wire, external_pdf, test_id, mobile_emulator, device_width, device_height, device_pixel_ratio, browser)
   1654         )
   1655     else:
-> 1656         return get_local_driver(
   1657             browser_name,
   1658             headless,

[/usr/local/lib/python3.10/dist-packages/seleniumbase/core/browser_launcher.py](https://localhost:8080/#) in get_local_driver(browser_name, headless, locale_code, servername, proxy_string, proxy_auth, proxy_user, proxy_pass, proxy_bypass_list, proxy_pac_url, multi_proxy, user_agent, recorder_ext, disable_js, disable_csp, enable_ws, enable_sync, use_auto_ext, undetectable, uc_cdp_events, uc_subprocess, log_cdp_events, no_sandbox, disable_gpu, headless2, incognito, guest_mode, dark_mode, devtools, remote_debug, enable_3d_apis, swiftshader, ad_block_on, host_resolver_rules, block_images, do_not_track, chromium_arg, firefox_arg, firefox_pref, user_data_dir, extension_zip, extension_dir, disable_features, binary_location, driver_version, page_load_strategy, use_wire, external_pdf, mobile_emulator, device_width, device_height, device_pixel_ratio)
   3562                                     uc_path = LOCAL_UC_DRIVER
   3563                                     uc_path = os.path.realpath(uc_path)
-> 3564                                 driver = undetected.Chrome(
   3565                                     options=chrome_options,
   3566                                     user_data_dir=user_data_dir,

[/usr/local/lib/python3.10/dist-packages/seleniumbase/undetected/__init__.py](https://localhost:8080/#) in __init__(self, options, user_data_dir, driver_executable_path, browser_executable_path, port, enable_cdp_events, log_level, headless, patch_driver, version_main, patcher_force_close, suppress_welcome, use_subprocess, debug, **kw)
    310             if hasattr(service_, "creation_flags"):
    311                 setattr(service_, "creation_flags", creationflags)
--> 312             super().__init__(options=options, service=service_)
    313             self.reactor = None
    314             if enable_cdp_events:

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chrome/webdriver.py](https://localhost:8080/#) in __init__(self, options, service, keep_alive)
     43         options = options if options else Options()
     44 
---> 45         super().__init__(
     46             browser_name=DesiredCapabilities.CHROME["browserName"],
     47             vendor_prefix="goog",

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chromium/webdriver.py](https://localhost:8080/#) in __init__(self, browser_name, vendor_prefix, options, service, keep_alive)
     64 
     65         try:
---> 66             super().__init__(command_executor=executor, options=options)
     67         except Exception:
     68             self.quit()

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py](https://localhost:8080/#) in __init__(self, command_executor, keep_alive, file_detector, options)
    206         self._authenticator_id = None
    207         self.start_client()
--> 208         self.start_session(capabilities)
    209 
    210     def __repr__(self):

[/usr/local/lib/python3.10/dist-packages/seleniumbase/undetected/__init__.py](https://localhost:8080/#) in start_session(self, capabilities)
    468         if not capabilities:
    469             capabilities = self.options.to_capabilities()
--> 470         super().start_session(capabilities)
    471 
    472     def quit(self):

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py](https://localhost:8080/#) in start_session(self, capabilities)
    290 
    291         caps = _create_caps(capabilities)
--> 292         response = self.execute(Command.NEW_SESSION, caps)["value"]
    293         self.session_id = response.get("sessionId")
    294         self.caps = response.get("capabilities")

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py](https://localhost:8080/#) in execute(self, driver_command, params)
    345         response = self.command_executor.execute(driver_command, params)
    346         if response:
--> 347             self.error_handler.check_response(response)
    348             response["value"] = self._unwrap_value(response.get("value", None))
    349             return response

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py](https://localhost:8080/#) in check_response(self, response)
    227                 alert_text = value["alert"].get("text")
    228             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 229         raise exception_class(message, screen, stacktrace)

WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:9222
from chrome not reachable
Stacktrace:
#0 0x5d1b4c1b3dc3 <unknown>
#1 0x5d1b4bea2337 <unknown>
#2 0x5d1b4be8d599 <unknown>
#3 0x5d1b4bedb982 <unknown>
#4 0x5d1b4bed243d <unknown>
#5 0x5d1b4bf1b7f0 <unknown>
#6 0x5d1b4bf0f1f3 <unknown>
#7 0x5d1b4bee028a <unknown>
#8 0x5d1b4bee0c5e <unknown>
#9 0x5d1b4c1780eb <unknown>
#10 0x5d1b4c17c03b <unknown>
#11 0x5d1b4c164201 <unknown>
#12 0x5d1b4c17cba2 <unknown>
#13 0x5d1b4c1490bf <unknown>
#14 0x5d1b4c1a2f18 <unknown>
#15 0x5d1b4c1a30f0 <unknown>
#16 0x5d1b4c1b2f14 <unknown>
#17 0x783ad5f8cac3 <unknown>

Please help!

mdmintz commented 6 months ago

Duplicate of https://github.com/seleniumbase/SeleniumBase/issues/2030#issuecomment-1693345122


That's a popular issue from undetected-chromedriver, with 40 open tickets so far: https://github.com/search?q=repo%3Aultrafunkamsterdam%2Fundetected-chromedriver+%22from+chrome+not+reachable%22&type=issues

To get around that, use the SB Manager format, (similar to the Driver Manager format), which has extra internal code for handling that (by using a virtual display in a headless environment). Here's a sample script: (https://github.com/seleniumbase/SeleniumBase/blob/master/examples/raw_uc_mode.py)

from seleniumbase import SB

with SB(uc=True, test=True) as sb:
    url = "https://gitlab.com/users/sign_in"
    sb.driver.uc_open_with_reconnect(url, 3)
    if not sb.is_text_visible("Username", '[for="user_login"]'):
        sb.driver.uc_open_with_reconnect(url, 4)
    sb.assert_text("Username", '[for="user_login"]', timeout=3)
    sb.assert_element('label[for="user_login"]')
    sb.highlight('button:contains("Sign in")')
    sb.highlight('h1:contains("GitLab.com")')
    sb.post_message("SeleniumBase wasn't detected", duration=4)

You could also use SeleniumBase's BaseCase format with pytest --uc.

Here's an example script using that format: (https://github.com/seleniumbase/SeleniumBase/blob/master/examples/verify_undetected.py)

from seleniumbase import BaseCase
BaseCase.main(__name__, __file__, "--uc", "-s")

class UndetectedTest(BaseCase):
    def test_browser_is_undetected(self):
        url = "https://gitlab.com/users/sign_in"
        if not self.undetectable:
            self.get_new_driver(undetectable=True)
        self.driver.uc_open_with_reconnect(url, 3)
        if not self.is_text_visible("Username", '[for="user_login"]'):
            self.get_new_driver(undetectable=True)
            self.driver.uc_open_with_reconnect(url, 4)
        self.assert_text("Username", '[for="user_login"]', timeout=3)
        self.post_message("SeleniumBase wasn't detected", duration=4)
        self._print("\n Success! Website did not detect Selenium! ")

Use self.driver to access the raw driver. See https://github.com/seleniumbase/SeleniumBase/tree/master/examples for lots of examples. See https://github.com/seleniumbase/SeleniumBase/blob/master/help_docs/method_summary.md for a list of methods.

megalevel commented 6 months ago

Thank you for your reply, I really appreciate it. I tried to follow your recommendations, but I get the error again. I will be grateful for any help!

My code:

# Installing chromedriver
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

# Installing seleniumbase
!pip install seleniumbase

# Installing xvfb
!apt-get install -y xvfb x11-utils
!pip install pyvirtualdisplay PyOpenGL PyOpenGL-accelerate
!echo $DISPLAY

# Importing Seleniumbase
from seleniumbase import SB
from sbvirtualdisplay import Display
from seleniumbase import Driver

# Importing virtual display
from sbvirtualdisplay import Display

display = Display(visible=0, size=(1440, 1880))
display.start()

with SB(uc=True, headless=True, xvfb=True) as sb:
    url = "https://gitlab.com/users/sign_in"
    sb.driver.uc_open_with_reconnect(url, 3)
    if not sb.is_text_visible("Username", '[for="user_login"]'):
        sb.driver.uc_open_with_reconnect(url, 4)
    sb.assert_text("Username", '[for="user_login"]', timeout=3)
    sb.assert_element('label[for="user_login"]')
    sb.highlight('button:contains("Sign in")')
    sb.highlight('h1:contains("GitLab.com")')
    sb.post_message("SeleniumBase wasn't detected", duration=4)

display.stop()

Error:

---------------------------------------------------------------------------
WebDriverException                        Traceback (most recent call last)
[<ipython-input-6-4b3059faf609>](https://localhost:8080/#) in <cell line: 1>()
----> 1 with SB(uc=True, headless=True, xvfb=True) as sb:
      2     url = "https://gitlab.com/users/sign_in"
      3     sb.driver.uc_open_with_reconnect(url, 3)
      4     if not sb.is_text_visible("Username", '[for="user_login"]'):
      5         sb.driver.uc_open_with_reconnect(url, 4)

13 frames
[/usr/lib/python3.10/contextlib.py](https://localhost:8080/#) in __enter__(self)
    133         del self.args, self.kwds, self.func
    134         try:
--> 135             return next(self.gen)
    136         except StopIteration:
    137             raise RuntimeError("generator didn't yield") from None

[/usr/local/lib/python3.10/dist-packages/seleniumbase/plugins/sb_manager.py](https://localhost:8080/#) in SB(test, rtf, raise_test_failure, browser, headless, headless2, locale_code, protocol, servername, port, proxy, proxy_bypass_list, proxy_pac_url, multi_proxy, agent, cap_file, cap_string, recorder_ext, disable_js, disable_csp, enable_ws, enable_sync, use_auto_ext, undetectable, uc_cdp_events, uc_subprocess, log_cdp_events, incognito, guest_mode, dark_mode, devtools, remote_debug, enable_3d_apis, swiftshader, ad_block_on, host_resolver_rules, block_images, do_not_track, chromium_arg, firefox_arg, firefox_pref, user_data_dir, extension_zip, extension_dir, disable_features, binary_location, driver_version, skip_js_waits, use_wire, external_pdf, is_mobile, mobile, device_metrics, xvfb, start_page, rec_print, rec_behave, record_sleep, data, var1, var2, var3, variables, account, environment, headed, maximize, disable_ws, disable_beforeunload, settings_file, uc, undetected, uc_cdp, uc_sub, log_cdp, wire, pls, sjw, save_screenshot, no_screenshot, page_load_strategy, timeout_multiplier, js_checking_on, slow, demo, demo_sleep, message_duration, highlights, interval, time_limit)
    933             proxy_helper.remove_proxy_zip_if_present()
    934     start_time = time.time()
--> 935     sb.setUp()
    936     test_passed = True  # This can change later
    937     teardown_exception = None

/usr/local/lib/python3.10/dist-packages/seleniumbase/fixtures/base_case.py in setUp(self, masterqa_mode)
  14740         else:
  14741             # Launch WebDriver for both pytest and pynose
> 14742             self.driver = self.get_new_driver(
  14743                 browser=self.browser,
  14744                 headless=self.headless,

[/usr/local/lib/python3.10/dist-packages/seleniumbase/fixtures/base_case.py](https://localhost:8080/#) in get_new_driver(self, browser, headless, locale_code, protocol, servername, port, proxy, proxy_bypass_list, proxy_pac_url, multi_proxy, agent, switch_to, cap_file, cap_string, recorder_ext, disable_js, disable_csp, enable_ws, enable_sync, use_auto_ext, undetectable, uc_cdp_events, uc_subprocess, log_cdp_events, no_sandbox, disable_gpu, headless2, incognito, guest_mode, dark_mode, devtools, remote_debug, enable_3d_apis, swiftshader, ad_block_on, host_resolver_rules, block_images, do_not_track, chromium_arg, firefox_arg, firefox_pref, user_data_dir, extension_zip, extension_dir, disable_features, binary_location, driver_version, page_load_strategy, use_wire, external_pdf, is_mobile, d_width, d_height, d_p_r)
   4021         from seleniumbase.core import browser_launcher
   4022 
-> 4023         new_driver = browser_launcher.get_driver(
   4024             browser_name=browser_name,
   4025             headless=headless,

[/usr/local/lib/python3.10/dist-packages/seleniumbase/core/browser_launcher.py](https://localhost:8080/#) in get_driver(browser_name, headless, locale_code, use_grid, protocol, servername, port, proxy_string, proxy_bypass_list, proxy_pac_url, multi_proxy, user_agent, cap_file, cap_string, recorder_ext, disable_js, disable_csp, enable_ws, enable_sync, use_auto_ext, undetectable, uc_cdp_events, uc_subprocess, log_cdp_events, no_sandbox, disable_gpu, headless2, incognito, guest_mode, dark_mode, devtools, remote_debug, enable_3d_apis, swiftshader, ad_block_on, host_resolver_rules, block_images, do_not_track, chromium_arg, firefox_arg, firefox_pref, user_data_dir, extension_zip, extension_dir, disable_features, binary_location, driver_version, page_load_strategy, use_wire, external_pdf, test_id, mobile_emulator, device_width, device_height, device_pixel_ratio, browser)
   1654         )
   1655     else:
-> 1656         return get_local_driver(
   1657             browser_name,
   1658             headless,

[/usr/local/lib/python3.10/dist-packages/seleniumbase/core/browser_launcher.py](https://localhost:8080/#) in get_local_driver(browser_name, headless, locale_code, servername, proxy_string, proxy_auth, proxy_user, proxy_pass, proxy_bypass_list, proxy_pac_url, multi_proxy, user_agent, recorder_ext, disable_js, disable_csp, enable_ws, enable_sync, use_auto_ext, undetectable, uc_cdp_events, uc_subprocess, log_cdp_events, no_sandbox, disable_gpu, headless2, incognito, guest_mode, dark_mode, devtools, remote_debug, enable_3d_apis, swiftshader, ad_block_on, host_resolver_rules, block_images, do_not_track, chromium_arg, firefox_arg, firefox_pref, user_data_dir, extension_zip, extension_dir, disable_features, binary_location, driver_version, page_load_strategy, use_wire, external_pdf, mobile_emulator, device_width, device_height, device_pixel_ratio)
   3562                                     uc_path = LOCAL_UC_DRIVER
   3563                                     uc_path = os.path.realpath(uc_path)
-> 3564                                 driver = undetected.Chrome(
   3565                                     options=chrome_options,
   3566                                     user_data_dir=user_data_dir,

[/usr/local/lib/python3.10/dist-packages/seleniumbase/undetected/__init__.py](https://localhost:8080/#) in __init__(self, options, user_data_dir, driver_executable_path, browser_executable_path, port, enable_cdp_events, log_level, headless, patch_driver, version_main, patcher_force_close, suppress_welcome, use_subprocess, debug, **kw)
    310             if hasattr(service_, "creation_flags"):
    311                 setattr(service_, "creation_flags", creationflags)
--> 312             super().__init__(options=options, service=service_)
    313             self.reactor = None
    314             if enable_cdp_events:

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chrome/webdriver.py](https://localhost:8080/#) in __init__(self, options, service, keep_alive)
     43         options = options if options else Options()
     44 
---> 45         super().__init__(
     46             browser_name=DesiredCapabilities.CHROME["browserName"],
     47             vendor_prefix="goog",

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/chromium/webdriver.py](https://localhost:8080/#) in __init__(self, browser_name, vendor_prefix, options, service, keep_alive)
     64 
     65         try:
---> 66             super().__init__(command_executor=executor, options=options)
     67         except Exception:
     68             self.quit()

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py](https://localhost:8080/#) in __init__(self, command_executor, keep_alive, file_detector, options)
    206         self._authenticator_id = None
    207         self.start_client()
--> 208         self.start_session(capabilities)
    209 
    210     def __repr__(self):

[/usr/local/lib/python3.10/dist-packages/seleniumbase/undetected/__init__.py](https://localhost:8080/#) in start_session(self, capabilities)
    468         if not capabilities:
    469             capabilities = self.options.to_capabilities()
--> 470         super().start_session(capabilities)
    471 
    472     def quit(self):

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py](https://localhost:8080/#) in start_session(self, capabilities)
    290 
    291         caps = _create_caps(capabilities)
--> 292         response = self.execute(Command.NEW_SESSION, caps)["value"]
    293         self.session_id = response.get("sessionId")
    294         self.caps = response.get("capabilities")

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py](https://localhost:8080/#) in execute(self, driver_command, params)
    345         response = self.command_executor.execute(driver_command, params)
    346         if response:
--> 347             self.error_handler.check_response(response)
    348             response["value"] = self._unwrap_value(response.get("value", None))
    349             return response

[/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py](https://localhost:8080/#) in check_response(self, response)
    227                 alert_text = value["alert"].get("text")
    228             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 229         raise exception_class(message, screen, stacktrace)

WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:9222
from chrome not reachable
Stacktrace:
#0 0x5779a8c14eca <unknown>
#1 0x5779a88fe2a1 <unknown>
#2 0x5779a88e896f <unknown>
#3 0x5779a8938825 <unknown>
#4 0x5779a892f0b4 <unknown>
#5 0x5779a8979b19 <unknown>
#6 0x5779a896d253 <unknown>
#7 0x5779a893d1c7 <unknown>
#8 0x5779a893db3e <unknown>
#9 0x5779a8bdb30b <unknown>
#10 0x5779a8bdf3b7 <unknown>
#11 0x5779a8bc7e3e <unknown>
#12 0x5779a8bdfe82 <unknown>
#13 0x5779a8bac7df <unknown>
#14 0x5779a8c041b8 <unknown>
#15 0x5779a8c0438b <unknown>
#16 0x5779a8c13ffc <unknown>
#17 0x7b2476180ac3 <unknown>
mdmintz commented 6 months ago

Where is Chrome installed on that machine? It might not be in the default location (if it's installed at all). Use binary_location to set the location of Chrome (not to be confused with chromedriver, which is different).

If you're using the SB() format, then you don't need to include sbvirtualdisplay, as that is already used with SB() on Linux machines.

Also try with SB(uc=True, headless=False, xvfb=True) as sb: because you don't need headless mode if you're using xvfb=True for the virtual display. Try different combination of that if it doesn't work for you, but first make sure Chrome is installed. If Chrome was already installed, set the location with binary_location.

go-to-mjrecent-on-TG commented 3 weeks ago

@megalevel Sorry, did you manage to run seleniumbase on colab?