heathdbrown / research

Things to look at it or that interest me.
0 stars 0 forks source link

AWX in Docker #23

Closed heathdbrown closed 1 year ago

heathdbrown commented 2 years ago

Goals

The idea was to see if we could get a local version of AWX installed on --Podman-- Docker inside of WSL.

The other side was to determine was has changed since last testing the docker setup in 2020 on a server docker installation.

Discovery

References

heathdbrown commented 2 years ago

https://github.com/heathdbrown/research/blob/main/awx-in-docker/README.md

heathdbrown commented 2 years ago

Issue: docker-compose: docker-auth awx/projects docker-compose-sources

What is interesting the line is not in the 'devel' branch just the latest release. https://github.com/ansible/awx/blob/devel/Makefile#L471

[2/2] STEP 40/40: CMD ["/bin/bash"]
--> Using cache 468c814e38c239c241b964dd87da4391317a03baf9c2559e0671441ce3e280db
[2/2] COMMIT quay.io/awx/awx_devel:main
--> 468c814e38c
[Warning] one or more build args were not consumed: [BUILDKIT_INLINE_CACHE]
Successfully tagged quay.io/awx/awx_devel:main
468c814e38c239c241b964dd87da4391317a03baf9c2559e0671441ce3e280db
make: *** No rule to make target 'docker-auth', needed by 'docker-compose'.  Stop.

Removed references to docker-auth and this now it works.

heathdbrown commented 2 years ago

Issue: docker-compose-sources ansible-playbook error referencing docker info

TASK [sources : Get OS info for sdb] ****************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "docker info | grep 'Operating System'\n", "delta": "0:00:00.140578", "end": "2021-11-05 15:43:31.045094", "msg": "non-zero return code", "rc": 1, "start": "2021-11-05 15:43:30.904516", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

PLAY RECAP ******************************************************************************************************************************************************
localhost                  : ok=9    changed=5    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

make: *** [Makefile:447: docker-compose-sources] Error 2

Running manually:

$ docker info | grep 'Operating'
ERRO[0000] error joining network namespace for container 8825c82136045ec546a26e4471e5b8d9fabe0017f0337ef031a81859dcbed1f1: error retrieving network namespace at /tmp/podman-run-1000/netns/cni-7b11cad4-1566-8c74-5a17-c288d3a10430: unknown FS magic on "/tmp/podman-run-1000/netns/cni-7b11cad4-1566-8c74-5a17-c288d3a10430": ef53

Seems to be related to podman and running rootless and on WSL2

https://github.com/containers/podman/issues/6800 https://www.redhat.com/sysadmin/sudo-rootless-podman

Trying export XDG_RUNTIME_DIR="/tmp/run-$(id -u)/$(cat /proc/sys/kernel/random/boot_id)

Now when trying to run the command I do not get the /tmp reference but I see the following error:

$podman ps
ERRO[0000] stat /tmp/run-1000/e055baa4-5f67-4e16-90d4-2415caf5cfed: no such file or directory

This issue https://github.com/containers/podman/issues/8539 references using a symlink and a new build need to check the version.

$ podman info --log-level=debug
ERRO[0000] stat /tmp/run-1000/e055baa4-5f67-4e16-90d4-2415caf5cfed: no such file or directory

To fix this, I removed the /tmp/run-1000, /tmp/pod-run-1000, removed the tried export XDG_RUNTIME_DR and the refersh_reboot function. I had aliases for docker to podman and then rebooted and now everything works. And I can now run podman info again.

docker info | grep 'Operating' dow not return a result, still.

Trying this on non-Ubuntu WSL system....

$ sudo docker info | grep 'Operating System'
<!--output omitted-->
Operating System: RHEV
$ docker --version
Docker version 1.13.1, build 7d71120/1.13.1

Due to the fact that i am trying to use podman instead of docker in WSL I am seeing the following information, which I think should be coming from docker info | grep 'Operating System' if I was using docker.

$ docker info | grep -i distribution
    distribution: ubuntu

Attempted to use docker info | grep -i ' distribution:' | awk '{print $2}'

Getting error when attempting to sed that into into the playbook to replace due to I think the '|' for the shell module.

TASK [sources : Get OS info for sdb] ****************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "\"docker info | grep '    distribution:' | awk '{print $2}'\"\n", "delta": "0:00:00.117658", "end": "2021-11-06 21:22:54.111028", "msg": "non-zero return code", "rc": 127, "start": "2021-11-06 21:22:53.993370", "stderr": "/bin/sh: 1: docker info | grep '    distribution:' | awk '{print }': not found", "stderr_lines": ["/bin/sh: 1: docker info | grep '    distribution:' | awk '{print }': not found"], "stdout": "", "stdout_lines": []}
benhocker commented 2 years ago

@heathdbrown - Have you tried https://packages.ubuntu.com/impish/podman-docker to wrap the podman in a docker compatible way?

heathdbrown commented 2 years ago

@benhocker thanks I will give it a shot.

heathdbrown commented 2 years ago

Alright moved away from podman on WSL to docker just due to too many things I was trying were tied into docker and trying to modify to something to use podman it would not work most of the time.

I used several references in Docker on WSL to get stuff working on docker in wsl.

heathdbrown commented 2 years ago

I had to modify to python version 3.9 by using pyenv and direct cloning awx from github.

$ make docker-compose
pyenv: python3.9: command not found

The `python3.9' command exists in these Python versions:
  3.9.9

Note: See 'pyenv help global' for tips on allowing both
      python2 and python3 to be found.
pyenv: python3.9: command not found

The `python3.9' command exists in these Python versions:
  3.9.9

Note: See 'pyenv help global' for tips on allowing both
      python2 and python3 to be found.

Used the following to 'reset' up the pyenv environment from https://github.com/pyenv/pyenv#readme

# the sed invocation inserts the lines at the start of the file
# after any initial comment lines
sed -Ei -e '/^([^#]|$)/ {a \
export PYENV_ROOT="$HOME/.pyenv"
a \
export PATH="$PYENV_ROOT/bin:$PATH"
a \
' -e ':a' -e '$!{n;ba};}' ~/.profile
echo 'eval "$(pyenv init --path)"' >>~/.profile

echo 'eval "$(pyenv init -)"' >> ~/.bashrc
pyenv install --list
pyenv install 3.9.9
pyenv shell  3.9.9
heathdbrown commented 2 years ago

Followed guide on installing docker-compose after encountering error that the command did not exist.

Had to modify the command to work with proxy and download the newest version.

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-20-04

sudo -E curl -L "https://github.com/docker/compose/releases/download/v2.2.3/docker-compose-$(uname -s | tr [:upper:] [:lower:])-$(uname -m)" -o /usr/local/bin/docker-compose
heathdbrown commented 2 years ago

Make Docker-compose fails with Timeout proxy userland error

Very close to being up:

make docker-compose
#output omitted#
tools_postgres_1    | 2022-02-02 03:13:46.266 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
tools_postgres_1    | 2022-02-02 03:13:46.266 UTC [1] LOG:  listening on IPv6 address "::", port 5432
tools_postgres_1    | 2022-02-02 03:13:46.279 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
tools_postgres_1    | 2022-02-02 03:13:46.322 UTC [78] LOG:  database system was shut down at 2022-02-02 03:13:46 UTC
tools_postgres_1    | 2022-02-02 03:13:46.337 UTC [1] LOG:  database system is ready to accept connections
Error response from daemon: driver failed programming external connectivity on endpoint tools_awx_1 (569cd6a504e6d9cb5b370c2fbdfdb89b8cc141732f96977b0971ff56dd50a44e): Timed out proxy starting the userland proxy
make: *** [Makefile:459: docker-compose] Error 1

Dockerd log file

WARN[2022-02-01T21:16:05.907794800-06:00] Failed to allocate and map port 7954-7954: Timed out proxy starting the userland proxy
ERRO[2022-02-01T21:16:14.492446200-06:00] 8f8029f9ec360fd327ae59eb1e211dfb915b8ff98a6464eb8dcc7e5cc1f310cd cleanup: failed to delete container from containerd: no such container
ERRO[2022-02-01T21:16:14.492693000-06:00] Handler for POST /v1.41/containers/8f8029f9ec360fd327ae59eb1e211dfb915b8ff98a6464eb8dcc7e5cc1f310cd/start returned error: driver failed programming external connectivity on endpoint tools_awx_1 (569cd6a504e6d9cb5b370c2fbdfdb89b8cc141732f96977b0971ff56dd50a44e): Timed out proxy starting the userland proxy

The dockerd error lead me here:

Which lead to ulimit -n 8192, research found that to view the limits with ulimit you can pass the -a option.

This understanding then found they were changing the open files limit to 8192.

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 51105
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 51105
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Now the previous github issue stated to check that and I have roughly:

$ cat /proc/sys/fs/file-max
1307524

Weirdly when attempting to run the command, it fails.

$ ulimit -n 8192
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Adding sudo causes a different failure

sudo ulimit -n 8192
sudo: ulimit: command not found

Found https://stackoverflow.com/questions/17483723/command-not-found-when-using-sudo-ulimit

sudo sh -c "ulimit -n 8192 && exec su $LOGNAME"

This looks to have opened the shell as expected and then switch back to my prompt, but the changes did not take effect.

$ sudo sh -c "ulimit -n 8192 && exec su $LOGNAME"

Command 'pyenv' not found, did you mean:

  command 'pyvenv' from deb python3-venv (3.8.2-0ubuntu2)
  command 'p7env' from deb libnss3-tools (2:3.49.1-1ubuntu1.6)

Try: sudo apt install <deb name>

# verify settings
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 51105
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 51105
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Attempted again with just the sudo sh, but nothing changed.

sudo sh -c "ulimit -n 8192"

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 51105
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 51105
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

When in doubt root it out, changed to root and issued the command, and it worked.

$ sudo su -
# ulimit -n 8192
~# ulimit -a | grep 'open files'
open files                      (-n) 8192

Sadly, after going to back to a normal user shell, we revert back to 1024.

$ ulimit -a | grep 'open files'
open files                      (-n) 1024

This lead me to: https://unix.stackexchange.com/questions/81843/sudo-ulimit-command-not-found.

They explain that because ulimit is a built-in 'sudo ulimit' makes no sense, as it only changes the limit for the sudo process.

If you want to change globals then you need to modify a configuration file. /etc/security/limits.conf, we also get a new command to check our hard limits.

$ ulimit -Hn
4096

Ok, after updating the configuration file, /etc/security/limits.conf, adding a 'hard' and 'soft' limit AND modifying the /etc/pam.d/common-session* files with the needed changes, I am stuck at the same 'hard' limit.

 sudo vim /etc/security/limits.conf
# 
$ cat /etc/security/limits.conf | grep nofile
#        - nofile - max number of open file descriptors
<user>            hard    nofile          8192
<user>            soft    nofile          8192
# 
sudo vim /etc/pam.d/common-session
session required        pam_limits.so

# 
sudo vim /etc/pam.d/common-session-noninteractive
session required        pam_limits.so
# 
$ ulimit -Hn
4096

# logout and back in
$ ulimit -Hn
4096
heathdbrown commented 2 years ago

other log message during make docker-compose with awx.

tools_awx_1         | WARN[0001] "/" is not a shared mount, this could cause issues or missing mounts with rootless containers
tools_receptor_hop  | WARNING 2022/02/02 18:27:42 Backend connection failed (will retry): dial tcp 172.18.0.4:2222: connect: connection refused
tools_receptor_2    | time="2022-02-02T18:28:19Z" level=warning msg="\"/\" is not a shared mount, this could cause issues or missing mount
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. NetBox is governed by a small group of core maintainers which means not all opened issues may receive direct feedback. Do not attempt to circumvent this process by "bumping" the issue; doing so will result in its immediate closure and you may be barred from participating in any future discussions. Please see our contributing guide.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to lack of activity. In an effort to reduce noise, please do not comment any further. Note that the core maintainers may elect to reopen this issue at a later date if deemed necessary.