siemens / kas

Setup tool for bitbake based projects
MIT License
339 stars 145 forks source link

kas-container: RuntimeError: can't start new thread #100

Closed attina closed 10 months ago

attina commented 11 months ago

When I try to run kas-container to build a core-image-minimal image. the following break issue show up. Where Kas 3.0.2 is working fine with same configure file.

2023-08-20 16:13:52 - INFO     - kas 4.0 started
2023-08-20 16:13:52 - INFO     - /repo$ git rev-parse --show-toplevel
2023-08-20 16:13:52 - ERROR    - can't start new thread
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/kas/kas.py", line 185, in main
    kas(sys.argv[1:])
  File "/usr/local/lib/python3.11/dist-packages/kas/kas.py", line 174, in kas
    plugin.run(args)
  File "/usr/local/lib/python3.11/dist-packages/kas/plugins/build.py", line 87, in run
    ctx.config = Config(ctx, args.config, args.target, args.task)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/kas/config.py", line 49, in __init__
    top_repo_path = Repo.get_root_path(
                    ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/kas/repos.py", line 224, in get_root_path
    (ret, output) = run_cmd(['git', 'rev-parse', '--show-toplevel'],
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/kas/libkas.py", line 170, in run_cmd
    (ret, output) = loop.run_until_complete(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/kas/libkas.py", line 130, in run_cmd_async
    process = await asyncio.create_subprocess_exec(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/subprocess.py", line 218, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 1694, in subprocess_exec
    transport = await self._make_subprocess_transport(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/unix_events.py", line 212, in _make_subprocess_transport
    watcher.add_child_handler(transp.get_pid(),
  File "/usr/lib/python3.11/asyncio/unix_events.py", line 1388, in add_child_handler
    thread.start()
  File "/usr/lib/python3.11/threading.py", line 957, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
jan-kiszka commented 11 months ago

Never seen this before. Can you describe in more details how your setup looks like? I assume you are not used kas-container but rather invoke kas directly, correct? What is your host OS, which version? Does the issue only trigger with a specific configuration file, or do all files you tried trigger this?

attina commented 10 months ago

Actually, I was running the kas-container. I guess the problem is inside the container 4.0. Here is the steps I run the kas-container

  1. Download the kas-container from the latest master branch;
  2. chmod;
  3. Run the kas-container with the following command line;
    # ./kas-container.4.0 -l debug build meta-pico/pc805_poky.yml
    + docker run -v /home/attina/workspace/pico/pc805/meta-pico:/repo:ro -v /home/attina/workspace/pico/pc805:/work:rw -e KAS_WORK_DIR=/work -v /home/attina/workspace/pico/pc805/build:/build:rw --workdir=/repo -e KAS_BUILD_DIR=/build -e USER_ID=1000 -e GROUP_ID=1000 --rm --init -t -i -e TERM=xterm-256color -e SHELL=/bin/bash --log-driver=none --user=root ghcr.io/siemens/kas/kas:4.0 -l debug build /repo/pc805_poky.yml
    2023-08-25 11:16:28 - INFO     - kas 4.0 started
    2023-08-25 11:16:28 - DEBUG    - Using selector: EpollSelector
    2023-08-25 11:16:28 - INFO     - /repo$ git rev-parse --show-toplevel
    2023-08-25 11:16:28 - ERROR    - can't start new thread
jan-kiszka commented 10 months ago

There must be more, specifically as we are running such command as part of CI (https://github.com/siemens/kas/blob/master/.github/workflows/next.yml#L89).

Is your pc805_poky.yml somehow special? Can you share it? How does the docker host look like?

attina commented 10 months ago

Here is the content of pc805_poky.yml file.

header:
  version: 8

distro: poky

machine: pc805

target:
  - core-image-minimal

repos:
  meta-pico:
    path: /work/meta-pico

  meta-riscv:
    url: https://github.com/riscv/meta-riscv
    refspec: 3d775dede1f1895ad33ade396438baa8054cd410 # kirkstone

  poky:
    url: https://git.yoctoproject.org/git/poky
    refspec: d6b8790370500b99ca11f0d8a05c39b661ab2ba6 # kirkstone
    layers:
      meta:
      meta-poky:
      meta-yocto-bsp:

  meta-openembedded:
    url: https://git.openembedded.org/meta-openembedded
    refspec: 594c9cf6d3205c8e40ff772383fd9ab7dd3ed2cc # kirkstone
    layers:
      meta-oe:
      meta-networking:
      meta-python:

bblayers_conf_header:
  standard: |
    POKY_BBLAYERS_CONF_VERSION = "2"
    BBPATH = "${TOPDIR}"
    BBFILES ?= ""
local_conf_header:
  standard: |
    CONF_VERSION = "1"
    PACKAGE_CLASSES = "package_rpm"
    SDKMACHINE = "x86_64"
    # Use 'haveged' instead 'rng-tools' due to 'SIGSEGV' error during start 'rngd'
    PACKAGE_EXCLUDE:append = " rng-tools"
    IMAGE_INSTALL:append = " haveged"
    IMAGE_FEATURES += " \
        ssh-server-dropbear \
        debug-tweaks \
        package-management \
    "
  diskmon: |
    BB_DISKMON_DIRS = "\
        STOPTASKS,${TMPDIR},1G,100K \
        STOPTASKS,${DL_DIR},1G,100K \
        STOPTASKS,${SSTATE_DIR},1G,100K \
        STOPTASKS,/tmp,100M,100K \
        HALT,${TMPDIR},100M,1K \
        HALT,${DL_DIR},100M,1K \
        HALT,${SSTATE_DIR},100M,1K \
        HALT,/tmp,10M,1K"

And the docker host machine information is:

NAME="Fedora Linux"
VERSION="38 (Workstation Edition)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="Workstation Edition"
VARIANT_ID=workstation
jan-kiszka commented 10 months ago

Must be something in the host's docker or podman configuration (I tried both here successfully):

$ kas-container build pc805_poky.yml
2023-08-26 19:19:50 - INFO     - kas 4.0 started
2023-08-26 19:19:50 - INFO     - /repo$ git rev-parse --show-toplevel
2023-08-26 19:19:50 - INFO     - /repo$ hg root
2023-08-26 19:19:50 - INFO     - /repo$ git rev-parse --show-toplevel
2023-08-26 19:19:50 - INFO     - /repo$ hg root
2023-08-26 19:19:51 - WARNING  - Using deprecated refspec for repository "meta-riscv". You should migrate to commit/branch.
2023-08-26 19:19:51 - WARNING  - Using deprecated refspec for repository "poky". You should migrate to commit/branch.
2023-08-26 19:19:51 - WARNING  - Using deprecated refspec for repository "meta-openembedded". You should migrate to commit/branch.
2023-08-26 19:19:51 - INFO     - /work$ git clone -q https://github.com/riscv/meta-riscv /work/meta-riscv
2023-08-26 19:19:51 - INFO     - /work$ git clone -q https://git.yoctoproject.org/git/poky /work/poky
2023-08-26 19:19:51 - INFO     - /work$ git clone -q https://git.openembedded.org/meta-openembedded /work/meta-openembedded
2023-08-26 19:20:02 - INFO     - Repository meta-riscv cloned
2023-08-26 19:20:02 - INFO     - /work/meta-riscv$ git remote set-url origin https://github.com/riscv/meta-riscv
2023-08-26 19:20:02 - INFO     - /work/meta-riscv$ git cat-file -t 3d775dede1f1895ad33ade396438baa8054cd410
2023-08-26 19:20:02 - INFO     - Repository meta-riscv already contains 3d775dede1f1895ad33ade396438baa8054cd410 as commit
...
attina commented 10 months ago

Specified KAS_CONTAINER_ENGINE=podman in command line works fine with same configure yml file. The issue should relate to something with host docker.

attina commented 10 months ago

The final conclusion of this issue is: This is because the default seccomp profile of Docker 20.10.9 is not adjusted to support the clone() syscall wrapper of glibc 2.34 adopted in recent ubuntu and fedora update. You can find the detail information here The workable workaround I tested is add --security-opt seccomp=unconfined in the docker run command.