mozilla / geckodriver

WebDriver for Firefox
https://firefox-source-docs.mozilla.org/testing/geckodriver/
Mozilla Public License 2.0
7.03k stars 1.51k forks source link

Geckodriver crashes with SIGSYS on Raspberry Pi 4 Model B (aarch64) inside Docker #2135

Closed bogdan-copocean closed 4 months ago

bogdan-copocean commented 9 months ago

System

Testcase

I am running a Python Selenium script within a Docker container, based on a custom image that installs Firefox and GeckoDriver. The error that I'm getting while trying to instantiate the Firefox webdriver is the following:

Message: Unable to obtain driver for firefox using Selenium Manager.; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location

The Dockerfile I'm using is as follows:

FROM python:3.10-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    SELENIUM_IN_DOCKER=true

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
       wget \
       unzip \
       firefox-esr \
    # Cleanup to reduce image size
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux-aarch64.tar.gz \
    && tar -xzf geckodriver-v0.33.0-linux-aarch64.tar.gz \
    && mv geckodriver /usr/bin/geckodriver \
    && chmod +x /usr/bin/geckodriver \
    && rm geckodriver-v0.33.0-linux-aarch64.tar.gz

COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt

WORKDIR /app

COPY . /app

RUN addgroup --gid 1000 worker \
    && adduser --disabled-password --gid 1000 --uid 1000 worker \
    && mkdir -p /app/output && chown -R worker:worker /app/output/ \
    && chsh -s /usr/sbin/nologin root

USER worker

CMD ["python", "-m", "src.api.worker"]

Stacktrace

While debugging, I found out that the application gets terminated with a SIGSYS signal, indicating a bad system call:

root@6f4bbab79386:/app# geckodriver --version
Bad system call (core dumped)

Trace-level log

Upon running strace, I encountered the following issue:

root@6f4bbab79386:/app# strace -n geckodriver --version
[  11] execve("/usr/bin/geckodriver", ["geckodriver", "--version"], 0xff8162c8 /* 17 vars */) = 0
[ 222] syscall_0xde(0, 0x2c0, 0x3, 0x22, 0xffffffff, 0) = 0
[ 222] +++ killed by SIGSYS (core dumped) +++
Bad system call

This suggests that the issue might be related to the syscall 222 (mmap) for aarch64.

Hacky solution

A temporary workaround that allows geckodriver to run, although it might be highly insecure (and it only solves the geckodriver errors - not my py script) is to use the --security-opt seccomp=unconfined flag when running the container.

Please let me know how to properly fix this or if you need any additional information.

Thank you!

bogdan-copocean commented 9 months ago

I just found out that selenium-manager is installed for a x86_x64 arch, even though I installed it via pip inside my container:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/selenium_manager.py", line 126, in run
    completed_proc = subprocess.run(args, capture_output=True)
  File "/usr/local/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/local/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/linux/selenium-manager'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/driver_finder.py", line 38, in get_path
    path = SeleniumManager().driver_location(options) if path is None else path
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/selenium_manager.py", line 95, in driver_location
    output = self.run(args)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/selenium_manager.py", line 132, in run
    raise WebDriverException(f"Unsuccessful command executed: {command}") from err
selenium.common.exceptions.WebDriverException: Message: Unsuccessful command executed: /usr/local/lib/python3.10/site-packages/selenium/webdriver/common/linux/selenium-manager --browser firefox --output json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/firefox/webdriver.py", line 59, in __init__
    self.service.path = DriverFinder.get_path(self.service, options)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/driver_finder.py", line 41, in get_path
    raise NoSuchDriverException(msg) from err
selenium.common.exceptions.NoSuchDriverException: Message: Unable to obtain driver for firefox using Selenium Manager.; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location
root@62990962642e:/app# file /usr/local/lib/python3.10/site-packages/selenium/webdriver/common/linux/selenium-manager
/usr/local/lib/python3.10/site-packages/selenium/webdriver/common/linux/selenium-manager: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
whimboo commented 9 months ago

So does that mean that with the correct Selenium-Manager it works and only fails because the wrong one was installed above?

bogdan-copocean commented 9 months ago

Hey Henrik, I just found out that selenium-manager doesn't yet support aarch64 on Linux unfortunately, so I hit a wall trying to run my script (well, I mean hitting a wall for a straight-forward approach - I will try workarounds).

Issue Link Docs link

I raised this issue here only because I couldn't run geckodriver in the container directly, and I guess the selenium-manager is a separate problem. What do you think?

This is a personal project for me, and I'm just experimenting stuff here (didn't invest too much time into the stack), so sorry if I didn't make sense, or omitting some information :)

LE: I made it work with the suggestion from the Selenium documentation (using the Service class and point to geckodriver), but I still need to run the container with --security-opt seccomp=unconfined flag in order for geckodriver to work (as I explained what happens in the description above).

whimboo commented 9 months ago

Yes, if you download the proper version of geckodriver for your system and it works by explicitly telling selenium the path to it, then we can close this issue. You should then file an issue for SeleniumManager to get aarch64 support added on Linux if none exist yet.

whimboo commented 9 months ago

Oh and maybe you can run geckodriver with the RUST_BACKTRACE=1 environment variable set and let it crash? I would be interested where in the Rust stack the crash actually happens. It might be an issue with some Rust crate here as well.

bogdan-copocean commented 9 months ago
root@cf8398669a3d:/app#
root@cf8398669a3d:/app# env
PYTHONUNBUFFERED=1
SELENIUM_IN_DOCKER=true
PYTHONDONTWRITEBYTECODE=1
PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
RUST_BACKTRACE=1
root@cf8398669a3d:/app# geckodriver --version
Bad system call (core dumped)
root@cf8398669a3d:/app# RUST_BACKTRACE=1 geckodriver --version
Bad system call (core dumped)

Not much luck unfortunately. I guess it's because the problem is at kernel level and Rust doesn't have the chance to capture the stack.

bogdan-copocean commented 9 months ago

Regarding the lack of aarch64 support in selenium-manager, I've managed to implement a workaround (from their docs) that temporarily addresses the issue until official support is available.

As for running geckodriver within my Docker container, the situation is a bit trickier. To get it working, I still had to rely on the seccomp=unconfined security option. I'm fully aware this approach has security implications, but it works for me as I'm just experimenting with it, but might be a risk for others.

I'm sharing these findings in hopes that it might shed light on areas of improvement or at least serve as data points for others encountering similar issues :)

whimboo commented 8 months ago

@jgraham could you please have a look at this issue? Is there something that we should / can do?

whimboo commented 5 months ago

It might be good to know where in geckodriver this problem is actually caused. Therefore a rust stacktrace would still be good to have. I wonder if a debug build of geckodriver might give the details for the stack, or maybe you could attach a debugger? Probably one of the crates that we depend on uses it.

jgraham commented 5 months ago

FWIW I suspect running docker with --cap-add IPC_LOCK would be enough if it's indeed mmap that's causing a problem.

whimboo commented 5 months ago

@bogdan-copocean would you mind testing the proposal from @jgraham? Does that work for you?

whimboo commented 4 months ago

Closing as incomplete for now. If there is something on our end to do I'm happy to reopen.