hercules-ci / arion

Run docker-compose with help from Nix/NixOS
Apache License 2.0
622 stars 48 forks source link

Arion tries to start a service before podman is up #238

Open pedorich-n opened 4 months ago

pedorich-n commented 4 months ago

Hi!

I've just encountered an error where after reboot arion on NixOS was trying to start the containers before the podman was ready:

2024-04-21T22:24:15+0900 arion-home-automation-start[2418281]: docker compose file: /nix/store/4d2i95x5rhb55dyncssr4xfz87sl8lyh-docker-compose.yaml
2024-04-21T22:24:15+0900 arion-home-automation-start[2418291]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2024-04-21T22:24:15+0900 arion-home-automation-start[2418284]: arion: readCreateProcess: docker "images" "--filter" "dangling=false" "--format" "{{.Repository}}:{{.Tag}}" (exit 1): failed

It looks like it auto-restarted eventually.

But this makes me think: would it make sense to add something like After = [ "podman.service" ] (or docker, depending on the backend) or ConditionPathExists=/var/run/docker.sock to the systemd service definition to ensure it starts correctly from the first try?

Edit: More info NixOS stable 23.05, nixpkgs commit bc194f70731cc5d2b046a6c1b3b15f170f05999c

$ podman --version
podman version 4.7.2

$ arion version
docker-compose version 1.29.2, build unknown
docker-py version: <module 'docker.version' from '/nix/store/7420rvz9fw7cjqkjf5i62zarv8s4p21c-python3.11-docker-6.1.3/lib/python3.11/site-packages/docker/version.py'>
CPython version: 3.11.8
OpenSSL version: OpenSSL 3.0.13 30 Jan 2024
roberth commented 4 months ago

It does have

https://github.com/hercules-ci/arion/blob/1886d25075aaf24c8bc687b3d2a87ae1f5d154ec/nixos-module.nix#L38

So that suggests that the podman socket is not registered with systemd.

Maybe it needs to run after tmpfiles too?

https://github.com/NixOS/nixpkgs/blob/ff03bc83894ca42d93f80ec6ea82b9e4eaff02b9/nixos/modules/virtualisation/podman/default.nix#L244

Ideally systemd would know that the docker socket is an alias for the podman socket. I think this could be achieved with multiple ListenStreams in podman: one for each location. That makes the tmpfiles solution seem like a hack.

pedorich-n commented 4 months ago

Ah, I didn't realize after = [ "sockets.target" ]; is there to ensure it starts after the socket is present. This makes sense now.

Thanks for the quick fix! Like you said, race conditions are hard to test, but with your change applied after a couple of reboots, I haven't seen the issue come up again.