product-os / jellyfish

The Jellyfish Project
https://jel.ly.fish/
GNU Affero General Public License v3.0
15 stars 3 forks source link

Avoid flooding the logs when the jellyfish-tests container starts #9133

Closed jellyfish-bot closed 2 years ago

jellyfish-bot commented 2 years ago

[ramirogm] The jellyfish-tests container is part of the docker-compose config of JF. It is used during the CI pipeline to run tests against a running JF instance, the SUT. When deploying in prod, the container is started but the tests don't run because of the specific ENV vars configured.

However, the container does a systemd init which generates a lot of confusing messages on the device log, and some extra unnecessary load on the system as it is booting other containers.

This systemd init is executed from the base container image used. Tracking it down:

1) on docker-compose:

services:
  jellyfish-tests:
    build:
      context: .
      dockerfile: Dockerfile.template
    # (TBC) https://www.flowdock.com/app/rulemotion/r-beginners/threads/uPPfzU-DGRehSDk-TkS-vZs7Q58
    # https://github.com/balena-io-modules/open-balena-base/blob/master/Dockerfile#L105
    command: /usr/bin/entry.sh

2) on Dockerfile.template

# https://github.com/product-os/jellyfish-base-images
# https://registry.hub.docker.com/r/resinci/jellyfish-test
FROM resinci/jellyfish-test:v4.0.10 AS base

ARG CI=1
ARG SUT=1

[...]
# --- tests runtime
FROM base

ENV CI 1

WORKDIR /usr/src/jellyfish

RUN systemctl enable confd

3) on .github/workflows/balena.yaml


          balena env add SUT 1 \
            --service '${{ env.JELLYFISH_TESTS_SERVICE }}' \
            --device '${{ steps.register-test-device.outputs.balena_device_uuid }}'

Once the jelyfish-tests container is running:

          # (TBC) https://www.flowdock.com/app/rulemotion/i-cli/threads/fkUfPzT-Tez4Ev3oSy07GRhfzao
          with_backoff ssh-uuid -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -t \
            --service ${{ env.JELLYFISH_TESTS_SERVICE }} \
            ${{ steps.register-test-device.outputs.balena_device_uuid }}.balena \
            'scripts/ci/run-tests.sh ${{ env.JELLYFISH_TESTS }}'

where

JELLYFISH_TESTS: |
 wait-for-api \
 integration-server \
 e2e e2e-ui \
 export-database \
 import-database \
 e2e-server-previous-dump \
 benchmark
JELLYFISH_TESTS_SERVICE: jellyfish-tests

4) from jellyfish/scripts/ci/run-tests.sh:

# don't accidentally run on instances with production data
if [[ $SUT -eq 1 ]]; then
    task "$@"
else
    echo "declare SUT=1; task $*"
fi

So in production the fleet should have $SUT -eq 1

Note that this command is run through an ssh session open from the GH workflow job which is running.

Going back to the container, it still runs its command which is defined on the docker-compose as:

    command: /usr/bin/entry.sh

This btw is the same which is defined in the base docker image ( resinci/jellyfish-test:v4.0.10 ), and it is not overriden in Dockerfile.template.

The script is at https://github.com/product-os/jellyfish-base-images

which inherits from

https://github.com/balena-io-modules/open-balena-base

which is the repo that has the script at https://github.com/balena-io-modules/open-balena-base/blob/master/src/entry.sh

#!/bin/bash
set -m

GREEN='\033[0;32m'
echo -e "${GREEN}Systemd init system enabled."

# systemd causes a POLLHUP for console FD to occur
# on startup once all other process have stopped.
# We need this sleep to ensure this doesn't occur, else
# logging to the console will not work.
sleep infinity &
for var in $(compgen -e); do
    printf '%q=%q\n' "$var" "${!var}"
done > /etc/docker.env
exec /sbin/init

Looking at the logs when a new release of JF is deployed we see:

Aug 18 12:15:40 f3276444e379[1089]: Systemd init system enabled.
[....]
Aug 18 12:15:40 f3276444e379[1089]: systemd 247.3-7 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)

And we confirm its jellyfish-tests:

root@38e2772:/mnt/data/jf-logs# balena ps | grep f3276444e379
f3276444e379   946026f75848                                                     "/usr/bin/entry.sh"      About an hour ago   Up About an hour                                                                                                                                                                                                  jellyfish-tests_5292968_2263859_b04052b1187eba9e02aa1ffe2f8d9668

Change

Restarting /sbin/init maybe useful for other balena service images, but is not needed in this image that just needs to wait until either ( in CI ) there's an ssh connection that runs the tests or ( in prod ) nothing happens.

1) An alternative is to replace the Dockerfile.template with a new file that doesn't use this base image, but something more custom to JF. The base image is at https://github.com/balena-io-modules/open-balena-base/blob/master/Dockerfile#L105

2) Maybe easier is to override the jellyfish-tests command on docker-compose as follows:

services:
  jellyfish-tests:
    build:
      context: .
      dockerfile: Dockerfile.template
    # (TBC) https://www.flowdock.com/app/rulemotion/r-beginners/threads/uPPfzU-DGRehSDk-TkS-vZs7Q58
    # https://github.com/balena-io-modules/open-balena-base/blob/master/Dockerfile#L105
    command: /bin/sleep infinity

About the first one, note that the open-balena-base is a highly maintained image that is used as the base of the api and ui, not only for jellyfish-tests

jellyfish/apps/server/Dockerfile jellyfish/apps/ui/Dockerfile

FROM balena/open-balena-base:v13.4.0 as base

3) Cleaner option seems to be to make a copy of Dockerfile.template to Dockerfile.jellyfish-tests , and replace the command there instead of putting it into the docker-compose file.

ramirogm commented 2 years ago

Closed by #9127