moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.47k stars 18.62k forks source link

[Proposal] Add file descriptor store to daemon and fd mapping args to CLI commands #48302

Open MayCXC opened 1 month ago

MayCXC commented 1 month ago

Description

An old-and-now-new-again technique to scale and update daemons that listen on sockets of all kinds, is to rely on the daemon executor to bind and listen on sockets and pass on their file descriptors, which then allows the daemon to be stopped and restarted while its sockets remain bound and listening. This is the function of tools like inetd, launchd, systemd-socket-activation, s6-fdholderd, etc. podman supports this functionality for containers with systemd: https://github.com/containers/podman/blob/main/docs/tutorials/socket_activation.md#socket-activation-of-containers

any container runtime daemon can just as easily support this functionality on its own, and the sockets themselves can comfortably be made part of an image configuration. Here is an example of instructions that could declare such file descriptors in a Dockerfile:

SOCKET 3/tcp
SOCKET 8/unix
ENTRYPOINT ...

This documents that the container expects to receive file descriptors 3 and 8 from the host, similar to the EXPOSE instruction for tcp/udp ports, and that they should be sockets that listen on the tcp and unix networks. Here is a corresponding service level element in a compose.yml:

services:
  www:
    sockets:
      - 0.0.0.0:8080:3/tcp
      - /run/www.sock:8/unix

here the daemon is instructed to open bind and listen on these sockets in the host, and pass them to the www service container as fds 3 and 8.

A program that supports socket activation like traefik can be executed seamlessly in this manner, and even while its container restarts or updates, it appears to be listening on both of these sockets. A savvy daemon can scale it to zero instances, wait for either socket to receive a connection, and then activate it again. This has an added benefit in compose projects that certain depends_on and healthcheck elements can become unnecessary, because the host can listen on every socket before the services that use them start. then services can connect to these listeners as early as they want, with host.docker.internal for network sockets, or a bind mount for named sockets, and simply wait for their connections to unblock.

I believe that declarative socket file descriptors carry the same advantages as bind mounts and bridge networks for containers that listen on sockets. They can be configured via CLI as well like so: docker run -s 0.0.0.0:80:3/tcp -s /run/www.sock:8/unix traefik ...

In other cases, CLI users may want to pass extra fds to a container without binding them on the host. This case can be documented with a similar instruction:

FD 4
FD 5
ENTRYPOINT ...

and configured via CLI as well: docker run -f 4 -f 5 ... to receive fds 4 and 5 from the parent process without binding them, or docker run -F ... to receive all the declared fds in this way. It could also be convenient to map fds with the CLI: docker run -f 6:4 -f 9:5 ... passes fd 6 from the host to fd 4 in the container, and does the same for 9 to 5.

This enables any docker host to enjoy the seamless restarts and reduced initialization complexity of socket activation, without relying on a particular init system. I think it follows in the spirit of https://github.com/moby/moby/issues/2658, but offers seamless restarts for containers and not just the daemon. I'd love to know what others think of this feature.

MayCXC commented 2 weeks ago

For programs that support socket activation, this feature would also provide the same benefits of https://github.com/moby/moby/issues/7536

MayCXC commented 2 weeks ago

The implementation of this feature will require/satisfy https://github.com/moby/moby/issues/43935 as well

MayCXC commented 1 week ago

the runc --preserve-fds option is also relevant to this: https://github.com/opencontainers/runc/blob/main/docs/terminals.md#other-file-descriptors

MayCXC commented 5 days ago

it would also be useful if compose could specify a network to create a listener for, other than the host:

networks:
  wwwnet:

services:
  www:
    networks:
      wwwnet:
        sockets:
          - 0.0.0.0:80:4/tcp
MayCXC commented 2 days ago

compose and the cli could also use a syntax 0.0.0.0:80:SOCKET_FD/tcp to indicate that the listener can be provided as any available fd, and the environment variable SOCKET_FD will be set to its number.