Open markafarrell opened 7 months ago
I'm attempting to reproduce this. Step 4 of https://github.com/markafarrell/mitogen-repro-issue-1061 doesn't leave a running container. Instead it immediately exits.
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker run -dt --name target-server \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--privileged \
--rm \
geerlingguy/docker-debian12-ansible:latest;
964532f2b017d53a6292b476e5e463e5157f8520db7e0a6ca6e4d3d3176885ee
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker --version
Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1
alex@ubuntu2004:~/mitogen-repro-issue-1061$ uname -a
Linux ubuntu2004 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
I'm guessing that you are using aarch64 is probably the issue.
There is an arm64 version of that image so it should work.
Do you get anything from:
docker logs target-server
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker rm target-server
target-server
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker run -dt --name target-server -v /sys/fs/cgroup:/sys/fs/cgroup:ro --privileged geerlingguy/docker-debian12-ansible:latest;
dea854a953ce1386fcf0ca7b5a28065b5749c982dab711e98fb7210f5968ba39
alex@ubuntu2004:~/mitogen-repro-issue-1061$ docker logs target-server
systemd 252.22-1~deb12u1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture arm64.
Welcome to Debian GNU/Linux 12 (bookworm)!
Failed to create /init.scope control group: Read-only file system
Failed to allocate manager object: Read-only file system
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
Can you try adding --cgroupns=host
and change the mount to be rw
?
That did it, and I see the _os.mkdir(file, 0o700)
error. Which leads to the next questions
- Why don't the unit and integration tests see this? Which extra ingredient(s) matter - Debian 12? systemd? Something Jeff Geerling added?
So I think this will happen regardless of OS, systemd etc. The issue is that https://github.com/mitogen-hq/mitogen/blob/master/ansible_mitogen/runner.py#L361 we are essentially doing
mkdir {{ ansible_remote_tmp }}/ansible_mitogen_runner_{{ random stuff }}/
If ansible_remote_tmp
doesn't exist this fails.
The existence of this (ansible_remote_tmp
) is only checked once, just after we connect to the target, so if it is removed after the connection happens then we see this failure.
2. Can we reproduce it with the existing Mitogen CI images and/or the localhost test?
It should be very easy to reproduce for both localhost and any other image by using a playbook similar to what i have in my reproduction repo. If you can point me to where the test should live i can quickly create one.
There are unit tests that mention is_good_temp()
in https://github.com/mitogen-hq/mitogen/blob/bb9c51b3e9cc39fceddd55578bb89680fa4e1acc/tests/ansible/tests/target_test.py#L31.
Integration tests should probably be added amongst https://github.com/mitogen-hq/mitogen/blob/bb9c51b3e9cc39fceddd55578bb89680fa4e1acc/tests/ansible/integration/runner/all.yml.
For running tests I'm relying on the Azure CI, and (force) pushing changes. We can squash any interim/WIP commits afterwards.
- Why don't the unit and integration tests see this? Which extra ingredient(s) matter - Debian 12? systemd? Something Jeff Geerling added?
A factor I previously missed: the repro playbook in https://github.com/markafarrell/mitogen-repro-issue-1061/blob/262591aecadb3ae255c904de17617519f8389673/playbook.yml is explicitly deleting $ANSIBLE_REMOTE_TMP
, it's not systemd or similar doing it behind the scenes. There's much less mystery here than I thought, if any.
If the ansible temp directory is removed mid-play mitogen does not recreate it and the play fails.
Using the normal ansible strategy the temp directory is recreated and the play succeeds.
Ansible version: 2.14.15
Host OS: Ubuntu (WSL2) Target OS: Debian12 (docker)
Host Python: Python 3.10.12 Target Python: Python 3.11.2
See https://github.com/markafarrell/mitogen-repro-issue-1061 for reproduction instructions