seleniumbase / SeleniumBase

πŸ“Š Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
5.36k stars 979 forks source link

chrome <defunct> instantly when uc=True #1705

Closed Write closed 1 year ago

Write commented 1 year ago

When uc parameter is not present, everyting work fine. However, as soon as uc parameter is set to True, chrome is instantly shown as

I ran in an other windows this command to see if it has any other state than but, no, it's instant. (or 1 second is not sensible enough). while [ 1 ] ; do ps aux | grep [c]hrome ; done

python3 test.py  
β•°β”€βž€  while [ 1 ] ; do ps aux | grep [c]hrome ; done 
root     1427004  0.0  0.0      0     0 pts/4    Z+   16:49   0:00 [chrome] <defunct>
root     1427004  0.0  0.0      0     0 pts/4    Z+   16:49   0:00 [chrome] <defunct>

EDIT : Well, while trying to strace python3 test.sh, I had some output in with ps aux, thanks to strace certainly slowing down the process :

root     1445803  0.0  0.1 33832496 57044 pts/4  Sl+  16:58   0:00 /opt/google/chrome/chrome --window-size=1440,1880 --no-sandbox --disable-dev-shm-usage --disable-browser-side-navigation --disable-save-password-bubble --disable-single-click-autofill --allow-file-access-from-files --disable-prompt-on-repost --dns-prefetch-disable --disable-translate --disable-renderer-backgrounding --disable-backgrounding-occluded-windows --remote-debugging-host=127.0.0.1 --remote-debugging-port=45609 --user-data-dir=/tmp/tmpu3e7miq8 --lang=en-US --no-default-browser-check --no-first-run --no-service-autorun --password-store=basic --log-level=0
root     1445816  0.0  0.0 33575996 3132 ?       Sl   16:58   0:00 /opt/google/chrome/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --database=/root/.config/google-chrome/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel= --annotation=lsb-release=Debian GNU/Linux 11 (bullseye) --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=109.0.5414.74 --initial-client-fd=5 --shared-client-connection
root     1445818  0.0  0.0 33567784 3020 ?       Sl   16:58   0:00 /opt/google/chrome/chrome_crashpad_handler --no-periodic-tasks --monitor-self-annotation=ptype=crashpad-handler --database=/root/.config/google-chrome/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel= --annotation=lsb-release=Debian GNU/Linux 11 (bullseye) --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=109.0.5414.74 --initial-client-fd=4 --shared-client-connection
root     1445823  0.0  0.1 254816 31668 pts/4    R+   16:58   0:00 /opt/google/chrome/chrome --type=zygote --no-zygote-sandbox --no-sandbox --log-level=0 --crashpad-handler-pid=1445816 --enable-crash-reporter=, --user-data-dir=/tmp/tmpu3e7miq8 --change-stack-guard-on-fork=enable
root     1445824  0.0  0.0 254816  7364 pts/4    R+   16:58   0:00 /opt/google/chrome/chrome --type=zygote --no-sandbox --log-level=0 --crashpad-handler-pid=1445816 --enable-crash-reporter=, --user-data-dir=/tmp/tmpu3e7miq8 --change-stack-guard-on-fork=enable
root     1445803  0.0  0.0      0     0 pts/4    Z+   16:58   0:00 [chrome] <defunct>
root     1445803  4.0  0.0      0     0 pts/4    Z+   16:58   0:00 [chrome] <defunct>
root     1445803  4.0  0.0      0     0 pts/4    Z    16:58   0:00 [chrome] <defunct>

And an other time I had this but the chrome binary was elsewhere weirdly ?

root     1451435  0.0  0.0 254816  5520 pts/4    R+   17:00   0:00 /usr/bin/google-chrome --window-size=1440,1880 --no-sandbox --disable-dev-shm-usage --disable-browser-side-navigation --disable-save-password-bubble --disable-single-click-autofill --allow-file-access-from-files --disable-prompt-on-repost --dns-prefetch-disable --disable-translate --disable-renderer-backgrounding --disable-backgrounding-occluded-windows --remote-debugging-host=127.0.0.1 --remote-debugging-port=41741 --user-data-dir=/tmp/tmpgtpnastc --lang=en-US --no-default-browser-check --no-first-run --no-service-autorun --password-store=basic --log-level=0

Yet an other try :

root     1454148  0.0  0.0 33567784 3040 ?       Sl   17:01   0:00 /opt/google/chrome/chrome_crashpad_handler --no-periodic-tasks --monitor-self-annotation=ptype=crashpad-handler --database=/root/.config/google-chrome/Crash Reports --url=https://clients2.google.com/cr/report --annotation=channel= --annotation=lsb-release=Debian GNU/Linux 11 (bullseye) --annotation=plat=Linux --annotation=prod=Chrome_Linux --annotation=ver=109.0.5414.74 --initial-client-fd=4 --shared-client-connection
root     1454133  0.0  0.0      0     0 pts/4    Z+   17:01   0:00 [chrome] <defunct>
root     1454133  4.0  0.0      0     0 pts/4    Z+   17:01   0:00 [chrome] <defunct>

Snippet used :

from seleniumbase import Driver
from seleniumbase import page_actions

driver = Driver(headless=True, uc=True)
driver.get("https://nowsecure.nl")
page_actions.wait_for_text(driver, "OH YEAH, you passed!", "h1")
print(driver.find_element("css selector", "body").text)
screenshot_name = "now_secure_image.png"
driver.save_screenshot(screenshot_name)
print("\nScreenshot saved to: %s" % screenshot_name)
driver.quit()

I just upgraded seleniumbase in case it would be related to this issue : https://github.com/seleniumbase/SeleniumBase/issues/1702

As far as binaries goes, I'm not sure which one is used, so :

β”€βž€  whereis google-chrome
google-chrome: /usr/bin/google-chrome /usr/share/man/man1/google-chrome.1.gz
β•°β”€βž€  google-chrome --version
Google Chrome 109.0.5414.74
╭─root@dockbuntu ~/test
β•°β”€βž€  whereis google-chrome-stable
google-chrome-stable: /usr/bin/google-chrome-stable /usr/share/man/man1/google-chrome-stable.1.gz
β”€βž€  google-chrome-stable --version
Google Chrome 109.0.5414.74

As we seen above, sometimes it use /opt/google/chrome/chrome binarie, I have no idea where this come from

β”€βž€  /opt/google/chrome/chrome --version
Google Chrome 109.0.5414.74 unknown

As the script isn't timing out, here is the output after I press CTRL+C

β•°β”€βž€  python test.py
^CTraceback (most recent call last):
  File "/root/test/test.py", line 4, in <module>
    driver = Driver(headless=True, uc=True)
  File "/usr/local/lib/python3.9/dist-packages/seleniumbase/plugins/driver_manager.py", line 394, in Driver
    driver = browser_launcher.get_driver(
  File "/usr/local/lib/python3.9/dist-packages/seleniumbase/core/browser_launcher.py", line 1149, in get_driver
    return get_local_driver(
  File "/usr/local/lib/python3.9/dist-packages/seleniumbase/core/browser_launcher.py", line 2726, in get_local_driver
    driver = undetected.Chrome(
  File "/usr/local/lib/python3.9/dist-packages/seleniumbase/undetected/__init__.py", line 288, in __init__
    super().__init__(
  File "/usr/local/lib/python3.9/dist-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__
    super().__init__(
  File "/usr/local/lib/python3.9/dist-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__
    super().__init__(
  File "/usr/local/lib/python3.9/dist-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.9/dist-packages/seleniumbase/undetected/__init__.py", line 404, in start_session
    super(
  File "/usr/local/lib/python3.9/dist-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.9/dist-packages/selenium/webdriver/remote/webdriver.py", line 438, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python3.9/dist-packages/selenium/webdriver/remote/remote_connection.py", line 290, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python3.9/dist-packages/selenium/webdriver/remote/remote_connection.py", line 311, in _request
    response = self._conn.request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.9/dist-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
  File "/usr/local/lib/python3.9/dist-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/usr/local/lib/python3.9/dist-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)

I tried to capture some strace logs, > strace python3 test.py Partial output until it hangs : https://bin.socialspill.com/zujobafaca.lua It specifically hang on recvfrom(4,.

EDIT: I managed to upload full strace file (warning: giant file) https://bin.socialspill.com/loyul.bash

mdmintz commented 1 year ago

OK, I think I found two different issues happening in there.

One of them should be fixed in https://github.com/seleniumbase/SeleniumBase/releases/tag/v4.12.2 (See https://github.com/seleniumbase/SeleniumBase/issues/1706 for the ticket)

For the second one, just make a copy of your chrome binary and call it google-chrome, and put it anywhere on your Linux PATH, such as /usr/local/bin/google-chrome. (You said yours was at /opt/google/chrome/chrome). For that you might be able to do the following (or similar):

cp "/opt/google/chrome/chrome" "/usr/local/bin/google-chrome"

I think that's because undetected-chromedriver is really picky about that, but I can't be 100% sure since it's your environment.

So try both things:

...and that should hopefully take care of it. Let me know if that didn't do the trick.

Write commented 1 year ago

Thanks for your fast answer, I upgraded seleniumbase and did the copy, but It seems to worsen things, so I purged google-chrome and reinstalled, with dpkg -i google-chrome-stable_current_amd64.deb, which install to /opt/google but it seems to add correct symlink and correctly add correct binaries to my path.

I have both chrome and google-chrome inside /opt/google/chrome, still I tried to copy both while maintaining their respective name, but the issue is still exactly the same, I'm kinda lost

Not sure what the difference is between chrome and google-chrome binaries which are both to be found in /opt/google/chrome/ after installation with dpkg.

Anyway, when I tried to copy files I had issue even when removing uc=True, purging chrome and reinstalling it make it works without uc=True, i'm pretty sure my binaries are correctly added when installing the dpkg, it shows up in /usr/bin

β•°β”€βž€  whereis google-chrome
google-chrome: /usr/bin/google-chrome

β•°β”€βž€  ls -la /usr/bin/google-chrome
lrwxrwxrwx 1 root root 31 Jan 26 09:47 /usr/bin/google-chrome -> /etc/alternatives/google-chrome

And I can confirm it can find the chrome binarie just fine, as it run for like 0.001 before crashing

root     2544746  0.0  0.1 33824264 53296 pts/1  S+   09:52   0:00 /opt/google/chrome/chrome --window-size=1440,1880 --no-sandbox --disable-dev-shm-usage --disable-browser-side-navigation --disable-save-password-bubble --disable-single-click-autofill --allow-file-access-from-files --disable-prompt-on-repost --dns-prefetch-disable --disable-translate --disable-renderer-backgrounding --disable-backgrounding-occluded-windows --remote-debugging-host=127.0.0.1 --remote-debugging-port=44927 --user-data-dir=/tmp/tmpsa3dnxhe --lang=en-US --no-default-browser-check --no-first-run --no-service-autorun --password-store=basic --log-level=0
root     2544746  0.0  0.0      0     0 pts/1    Z+   09:52   0:00 [chrome] <defunct>

When uc=False, it use exactly the same binaries but it doesn't crash and save the screenshot.

mdmintz commented 1 year ago

Let's see if it's a permissions thing. What's the output when you run the following command?

sbase get uc_driver

Here's what I see from my machine:

sbase get uc_driver

*** chromedriver version for download = 109.0.5414.74 (Latest)

Downloading chromedriver_mac64.zip from:
https://chromedriver.storage.googleapis.com/109.0.5414.74/chromedriver_mac64.zip ...
Download Complete!

Extracting ['chromedriver', 'LICENSE.chromedriver'] from chromedriver_mac64.zip ...
Unzip Complete!

The file [uc_driver] was saved to:
/Users/michael/github/SeleniumBase/seleniumbase/drivers/uc_driver

Making [uc_driver 109.0.5414.74] executable ...
[uc_driver 109.0.5414.74] is now ready for use!
The file [uc_driver] was saved to:
/Users/michael/github/SeleniumBase/seleniumbase/drivers/uc_driver

Making [uc_driver 109.0.5414.74] executable ...
[uc_driver 109.0.5414.74] is now ready for use!

That command downloads a chromedriver, but then renames it to uc_driver so that the undetected-chromedriver Patcher can make changes to uc_driver without impacting existing tests that want to use an unchanged chromedriver.

The two main ways undetected-chromedriver avoids detection is by 1. Modifying chromedriver to change things that would cause detection, and 2. Spinning up Chrome before attaching the driver to it (because some websites try to detect chromedriver by seeing what process was used to spin up Chrome).

The Chrome used by UC Mode is the one first found by:

import os
os.environ.get("PATH").split(os.pathsep)

that has one of these names: "google-chrome", "google-chrome-stable", "chrome", "google-chrome-beta", "google-chrome-dev", "chromium", "chromium-browser". If that Chrome is not the same as the one that SeleniumBase uses in regular mode, let me know, and that will certainly help in debugging this. The Chrome that SeleniumBase uses by default is set by internal chromedriver code (and I'm not sure of the order of that because the Chromium Team packages those drivers). I can change the order set by UC Mode here: https://github.com/seleniumbase/SeleniumBase/blob/bcdd9dacc21c349947512ddfe3e740df30855ff2/seleniumbase/undetected/__init__.py#L490 (so if I know the exact order that chromedriver uses, I can match it). If you let me know what the order is on your machine (by seeing which Chrome is launched in regular mode vs UC Mode) I can fix that if that is the issue.

So in summary, there are two tasks for you to help me debug this:

(Note, you already provided half of the second answer: /opt/google/chrome/chrome. I'm not sure which chrome is being using for UC Mode.)

mdmintz commented 1 year ago

I'm fairly certain it's due to using a different Chrome binary in UC Mode vs regular mode. If I can match search locations based on your answer to the comment I added directly above, I can make it work smoothly, rather than requiring users to enter the location of the Chrome binary in their scripts. (I want to simply things, not make them more complicated.)

mdmintz commented 1 year ago

I figured out an even easier way for you to detect the Chrome version used during UC Mode. Run this Python script and let me know the output:

from seleniumbase import undetected
print(undetected.find_chrome_executable())

If that output is not the same as the regular Chrome path, I know how to fix it.

mdmintz commented 1 year ago

From that, I learned that different versions of Ubuntu have different default versions of Chrome installed. https://github.com/mdmintz/undetected-testing/actions/runs/4017521390/jobs/6901981138 Undetected Mode is working successfully with all of them:


Screenshot 2023-01-26 at 12 29 37 PM

On your environment, there's probably multiple Chromes installed, (and the one used for UC Mode likely has different permissions / a different version from the other Chrome installed), which leads to your issue.

Write commented 1 year ago
β•°β”€βž€  sbase get uc_driver

*** chromedriver version for download = 109.0.5414.74 (Latest)

Downloading chromedriver_linux64.zip from:
https://chromedriver.storage.googleapis.com/109.0.5414.74/chromedriver_linux64.zip ...
Download Complete!

Extracting ['chromedriver', 'LICENSE.chromedriver'] from chromedriver_linux64.zip ...
Unzip Complete!

The file [chromedriver] was saved to:
/usr/local/lib/python3.9/dist-packages/seleniumbase/drivers/chromedriver

Making [chromedriver 109.0.5414.74] executable ...
[chromedriver 109.0.5414.74] is now ready for use!
The file [LICENSE.chromedriver] was saved to:
/usr/local/lib/python3.9/dist-packages/seleniumbase/drivers/LICENSE.chromedriver

Making [LICENSE.chromedriver 109.0.5414.74] executable ...
[LICENSE.chromedriver 109.0.5414.74] is now ready for use!

[uc_driver] will be created from [chromedriver] at runtime!

I figured out an even easier way for you to detect the Chrome version used during UC Mode. Run this Python script and let me know the output:

from seleniumbase import undetected
print(undetected.find_chrome_executable())

If that output is not the same as the regular Chrome path, I know how to fix it.

β•°β”€βž€ python version.py /bin/google-chrome-stable

β•°β”€βž€ ls -la /bin/google-chrome-stable lrwxrwxrwx 1 root root 32 Jan 23 21:41 /bin/google-chrome-stable -> /opt/google/chrome/google-chrome

From that, I learned that different versions of Ubuntu have different default versions of Chrome installed. On your environment, there's probably multiple Chromes installed, (and the one used for UC Mode likely has different permissions / a different version from the other Chrome installed), which leads to your issue.

Yeah possibly but I did apt remove --purge chrome and couldn't find any chrome binaries after that, so when installing with dpkg I tought I didn't have any other version of chrome

I use Debian 11, if that can help

mdmintz commented 1 year ago

I'm working a way to let anyone specify the Chromium binary location (https://github.com/seleniumbase/SeleniumBase/issues/1709). Once that's complete, you'll be able to make sure that your uc_mode on your server is using the same working Chrome binary as the one that is run in regular mode.

mdmintz commented 1 year ago

The option to set the binary location was released in Released in 4.12.3 - https://github.com/seleniumbase/SeleniumBase/releases/tag/v4.12.3 (Details in https://github.com/seleniumbase/SeleniumBase/issues/1709)

Write commented 1 year ago

The option to set the binary location was released in Released in 4.12.3 - https://github.com/seleniumbase/SeleniumBase/releases/tag/v4.12.3 (Details in #1709)

Thanks you very much ! It works perfectly now !

from seleniumbase import Driver
from seleniumbase import page_actions

driver = Driver(headless2=True, uc=True, browser="chrome", binary_location="/usr/bin/google-chrome")
driver.get("https://nowsecure.nl")
page_actions.wait_for_text(driver, "OH YEAH, you passed!", "h1")
print(driver.find_element("css selector", "body").text)
screenshot_name = "now_secure_image.png"
driver.save_screenshot(screenshot_name)
print("\nScreenshot saved to: %s" % screenshot_name)
driver.quit()

But it just seems to be that I used headless=True instead of headless2=true, which make it not defunct. using headless=true still make it instantly defunct, i'm sorry for not seeing that sooner.

An other question outside of the scope of this thread, do you know if it can be used in tandem with https://github.com/dgtlmoon/changedetection.io ? Changedetection just need to be able to control selenium with such url : WEBDRIVER_URL=http://browser-chrome:4444/wd/hub, I don't have enough knowledge if / or it's feasible to do with seleniumbase.

Anyway, Thanks a lot for your patience.

mdmintz commented 1 year ago

headless2 in SeleniumBase activates Chrome's new headless mode, which is more powerful, and fixes bugs of the standard headless mode. More info on that here:


The Chromium developers recently added a 2nd headless mode (in 2021). See https://bugs.chromium.org/p/chromium/issues/detail?id=706008#c36

They later renamed the option in 2023 for Chrome 109 -> https://github.com/chromium/chromium/commit/e9c516118e2e1923757ecb13e6d9fff36775d1f4

For Chrome 109 and above, the --headless=new flag will now allow you to get the full functionality of Chrome in the new headless mode, and you can even run extensions in it. (For Chrome versions 96 through 108, use --headless=chrome)

Standard Selenium usage: (Chrome 109 and above):

options.add_argument("--headless=new")

Standard Selenium usage: (Chrome 96 through Chrome 108):

options.add_argument("--headless=chrome")

If something works in regular Chrome, it should now work with the newer headless mode too.


Since SeleniumBase detects the browser version, it knows which of the newer headless modes to use when using headless2.

As for change-detection, I'm not familiar with the library you sent, but SeleniumBase does include its own visual regression testing tool: SeleniumBase/master/examples/visual_testing/ReadMe.md You can customize that tool to only alert you for specific changes, etc.