urlstechie / urlchecker-python

:snake: :link: Python module and client for checking URLs
https://urlchecker-python.readthedocs.io
MIT License
20 stars 12 forks source link

Web driver is not installed, but seems to be required #92

Open mabraham opened 5 hours ago

mabraham commented 5 hours ago

In a CI container where I have not installed a browser or web driver, I am trying to run urlchecker, but get an error message like

$ urlchecker check --files docs/html/index.html --save urlcheck.csv --exclude-patterns html-full,html-user,html-lib,.tar.gz,_sources --file-types "*.html" --serial .
WARNING:urlchecker.core.urlproc:Issue with driver, results will be improved if you have it! Please match your version from https://googlechromelabs.github.io/chrome-for-testing
           original path: .
              final path: /builds/gromacs/gromacs/build-docs
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: True
              file types: ['*.html']
                   files: ['docs/html/index.html']
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: ['html-full', 'html-user', 'html-lib', '.tar.gz', '_sources']
  file patterns excluded: []
          no check certs: False
              force pass: False
             retry count: 2
                    save: urlcheck.csv
                 timeout: 5
Traceback (most recent call last):
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/urlproc.py", line 282, in check_urls
    if needs_driver_check and driver and driver.check(url):
                                         ^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 83, in check
    self.get_browser()
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 102, in get_browser
    self.browser = webdriver.chrome.webdriver.WebDriver(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
    super().__init__(
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 66, in __init__
    super().__init__(command_executor=executor, options=options)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 212, in __init__
    self.start_session(capabilities)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 299, in start_session
    response = self.execute(Command.NEW_SESSION, caps)["value"]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 354, in execute
    self.error_handler.check_response(response)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /root/.cache/selenium/chrome/linux64/129.0.6668.89/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55d814f9602a <unknown>
#1 0x55d814c7c5e0 <unknown>
#2 0x55d814cb4921 <unknown>
#3 0x55d814cb02c5 <unknown>
#4 0x55d814cfcdf6 <unknown>
#5 0x55d814cfc446 <unknown>
#6 0x55d814cf08c3 <unknown>
#7 0x55d814cbe6b3 <unknown>
#8 0x55d814cbf68e <unknown>
#9 0x55d814f60a2b <unknown>
#10 0x55d814f649b1 <unknown>
#11 0x55d814f4d225 <unknown>
#12 0x55d814f65532 <unknown>
#13 0x55d814f32[38](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L38)f <unknown>
#14 0x55d814f84f28 <unknown>
#15 0x55d814f850f3 <unknown>
#16 0x55d814f94e7c <unknown>
#17 0x7f8e82b09a94 <unknown>
#18 0x7f8e82b96c3c <unknown>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/root/.local/bin/urlchecker", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/client/__init__.py", line 208, in main
    main(args=args, extra=extra)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/client/check.py", line 90, in main
    check_results = checker.run(
                    ^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/check.py", line 228, in run
    results[file_name] = check_task(**kwargs)
                         ^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/check.py", line 263, in check_task
    checker.check_urls(
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/urlproc.py", line 287, in check_urls
    if driver and driver.check(url):
                  ^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 83, in check
    self.get_browser()
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/urlchecker/core/webdriver.py", line 102, in get_browser
    self.browser = webdriver.chrome.webdriver.WebDriver(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
    super().__init__(
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 66, in __init__
    super().__init__(command_executor=executor, options=options)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 212, in __init__
    self.start_session(capabilities)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 299, in start_session
    response = self.execute(Command.NEW_SESSION, caps)["value"]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py", line 354, in execute
    self.error_handler.check_response(response)
  File "/root/.local/share/pipx/venvs/urlchecker/lib/python3.12/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /root/.cache/selenium/chrome/linux64/129.0.6668.89/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x563e6845602a <unknown>
#1 0x563e6813c5e0 <unknown>
#2 0x563e68174921 <unknown>
#3 0x563e681702c5 <unknown>
#4 0x563e681bcdf6 <unknown>
#5 0x563e681bc446 <unknown>
#6 0x563e681b08c3 <unknown>
#7 0x563e6817e6b3 <unknown>
#8 0x563e6817f68e <unknown>
#9 0x563e68420a2b <unknown>
#10 0x563e684249b1 <unknown>
#11 0x563e68[40](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L40)d225 <unknown>
#12 0x563e68[42](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L42)5532 <unknown>
#13 0x563e683f238f <unknown>
#14 0x563e68[44](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L44)4f28 <unknown>
#15 0x563e684[45](https://gitlab.com/gromacs/gromacs/-/jobs/8020189000#L45)0f3 <unknown>
#16 0x563e68454e7c <unknown>
#17 0x7facaa0e7a94 <unknown>
#18 0x7facaa174c3c <unknown>

That makes it look like a driver is actually required.

If a driver is intended to be optional, then I think https://github.com/urlstechie/urlchecker-python/blob/master/urlchecker/core/urlproc.py#L161 should set driver = None so that https://github.com/urlstechie/urlchecker-python/blob/master/urlchecker/core/urlproc.py#L282 will not choke on an invalid driver.

SuperKogito commented 2 hours ago

Yes you are totally right, the driver should be set to None to avoid the error. This is being done in https://github.com/urlstechie/urlchecker-python/blob/d0e7560e3bacf9d7e85bfec7171f3a7d17d9bcaa/urlchecker/core/urlproc.py#L152

When the exception is raised the returned value should be none unless line 156 changes the driver value, which means you do have a driver but it doesn't pass the sanity check. @vsoch might have a better explanation for this.
Also can you provide a bit more information on your setup please? A simple fix is to replace the return statement with two different ones; one under try and one under except.

vsoch commented 2 hours ago

When it crashes like that, it's a mismatch between the chrome you have and the driver.

mabraham commented 53 minutes ago

Thanks for the prompt replies!

This was running in a Docker container based on ubuntu 24.04, customized for building and linting some static HTML pages. There is no browser or similar, except as might have been brought in by pipx install urlchecker. So I don't know what kind of driver urlchecker might have found :-(

A simple fix is to replace the return statement with two different ones; one under try and one under except.

Yes, or to replace the driver by None if an exception was caught.