rabbitmq / rabbitmq-server-release

RabbitMQ packaging and release engineering bits that do not belong to the Concourse pipelines.
Other
13 stars 30 forks source link

Make startup scripts and service files more friendly to modern process supervision #111

Open space88man opened 5 years ago

space88man commented 5 years ago

The rabbitmq-server launch script runs multiple downstream scripts to start epmd and beam as long-running processes. This goes against modern process supervision that want epmd in the foreground and have beam and epmd as two separate services.

epmd -daemon enables epmd to escape process supervision suites that do not capture /usr/sbin/rabbitmq-server in a cgroup.

Observations:

  1. systemd: cleaner and the recommended way is to have epmd be one service and beam another service and beam could Requires= or After= epmd.

Barely manages to tame epmd -daemon, but only because of cgroups.

  1. Tried /usr/sbin/rabbitmq-server script in a Docker container running s6 as process supervisor. epmd escapes the supervisor by double forking and running as -daemon

Suggestions:

  1. Split epmd off into a separate service file and don't use -daemon
  2. Have a more direct command line that runs /usr/lib64/erlang/...beam.swp. The daemon script seems to go through enormous contortions to launch beam.smp. Lots of runuser / checking for UID/GID etc. Willl something like ExecStart=/usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 64 ... just work? However I am not sure where all the -MHas -MBas etc parameters come from.
michaelklishin commented 5 years ago

FYI, most of the script is migrated to Erlang on the rabbitmq-server-script-replacement branch.

lukebakken commented 5 years ago

Considerations to take into account:

space88man commented 5 years ago

"Not all of our supported operating systems use systemd" - other process supervisors in containers can lose sight of epmd. Can it be started without the -daemon argument? (Systemd actually manages to capture it due to cgroups - at least on EL7 it manages to clean up epmd/beam.smp and the erl children inet_gethost processes).

I remember that in the EL6 days /etc/init.d/rabbitmq-server stop didn't always clean up everything correctly.

michaelklishin commented 5 years ago

@space88man let's keep this issue a little bit more focused. Given that RabbitMQ supports a variety of platforms that do not use systemd and most of the scripts are moving to Erlang, what are some of the specific changes that you would like to see in the RPM package?

michaelklishin commented 5 years ago

Moved to the packaging repo as it currently seems to fit best here.

space88man commented 5 years ago

rabbitmq-epmd.service:

[Unit]
Description=Erlang Port Mapper Daemon
After=syslog.target network.target

[Service]
User=rabbitmq
Group=rabbitmq
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/lib64/erlang/erts-10.5.3/bin/epmd

[Install]
WantedBy=rabbitmq.target

rabbitmq-server.service:

[Unit]
Description=RabbitMQ broker
After=rabbitmq-epmd.service
Requires=rabbitmq-epmd.service

[Service]
Type=notify
User=rabbitmq
Group=rabbitmq
UMask=0027
NotifyAccess=all
TimeoutStartSec=3600
LimitNOFILE=32768
Restart=on-failure
RestartSec=10
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/sbin/rabbitmq-server
ExecStop=/usr/sbin/rabbitmqctl shutdown
SuccessExitStatus=69

[Install]
WantedBy=rabbitmq.target

rabbitmq.target:

[Unit]
Description=RabbitMQ Broker Target
lukebakken commented 5 years ago

@space88man If you have a specific problem, or issue you have seen due to how RabbitMQ currently starts, that would be useful information for us.

michaelklishin commented 5 years ago

@space88man sorry but we would not consider a change unless we understand it. Why should we adopt those unit files? What are the risks?

space88man commented 5 years ago

TL;DR: to work nicely with process supervisors (supervisord/s6 etc) don't launch epmd with -daemon.

@michaelklishin @lukebakken - This issue is intended to address process supervisors like supervisord, s6 which will be unable to manage epmd, given the way it is currently launched.

I'd like to clarify that the current rabbitmq:

Process supervisor problem with /usr/sbin/rabbitmq-server @lukebakken Specific problem: epmd is not correctly managed. The key issue with supervisord/s6 etc is that "Programs meant to be run under supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they are started." (Taken from the supervisord docs.)

Supervisor: https://github.com/just-containers/s6-overlay. A service in s6 is just an executable supervised by a monitoring process s6-supervise. Every service has its own long-running monitor that lies between it and PID 1, so the service is not intended to be a direct child of PID 1. The monitor never dies but the service main and child processes are expected to die when the service is down.

Configure a service rabbitmq in s6 and give the launch script as /usr/sbin/rabbitmq-server. (This example is in a container to remove all the noise from other OS processes.)

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 01:32 pts/0    00:00:00 s6-svscan -t0 /var/run/s6/services
root          27       1  0 01:32 pts/0    00:00:00 s6-supervise s6-fdholderd
root        2457       1  0 01:53 pts/0    00:00:00 s6-supervise rabbitmq
rabbitmq    2458    2457  0 01:53 ?        00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq    2666       1  0 01:53 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd -daemon
rabbitmq    2773    2458 34 01:53 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlm
rabbitmq    3093    2773  1 01:53 ?        00:00:00 erl_child_setup 1048576
rabbitmq    3147    3093  0 01:53 ?        00:00:00 inet_gethost 4
rabbitmq    3148    3147  0 01:53 ?        00:00:00 inet_gethost 4

Observations:

Problem: epmd survives as it was reparented to PID 1 by -daemon.

Simplest solution Is there a way to launch epmd without the -daemon option from /usr/sbin/rabbitmq-server or /usr/lib/rabbitmq/bin/rabbitmq-server?

Two service solution Configure a separate service for epmd (without -daemon). The process tree looks like this:

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 01:32 pts/0    00:00:00 s6-svscan -t0 /var/run/s6/services
root          27       1  0 01:32 pts/0    00:00:00 s6-supervise s6-fdholderd
root        2457       1  0 01:53 pts/0    00:00:00 s6-supervise rabbitmq
root        3538       1  0 02:23 pts/0    00:00:00 s6-supervise epmd
rabbitmq    3558    3538  0 02:23 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq    3561    2457  1 02:23 ?        00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq    3876    3561 67 02:23 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlm
rabbitmq    4196    3876  2 02:23 ?        00:00:00 erl_child_setup 1048576
rabbitmq    4250    4196  0 02:23 ?        00:00:00 inet_gethost 4
rabbitmq    4251    4250  0 02:23 ?        00:00:00 inet_gethost 4

Notice epmd is contained under s6-supervise 3538. Both services can be stopped cleanly.

# try to stop services cleanly. In s6 lingo, the commands are:
# s6-svc -d /run/s6/services/rabbitmq
# s6-svc -d /run/s6/services/epmd
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 01:32 pts/0    00:00:00 s6-svscan -t0 /var/run/s6/services
root          27       1  0 01:32 pts/0    00:00:00 s6-supervise s6-fdholderd
root        2457       1  0 01:53 pts/0    00:00:00 s6-supervise rabbitmq
root        3538       1  0 02:23 pts/0    00:00:00 s6-supervise epmd

Of course for this to work properly s6 (and any other process supervisor/service management) would have to be using service dependency and declare that the rabbitmq service depends on the epmd service.

In my previous post, I used systemd as it was easiest to demonstrate the dependency relationship that rabbitmq-server.service(beam.smp) depends on rabbitmq-epmd.service(epmd) and must be started after it. @michaelklishin there is no risk: this is merely an explicit declaration that the epmd process must be started first.

space88man commented 5 years ago

@michaelklishin For RPM/systemd based systems, let me try to show the intention of the two service proposal.

  1. epmd is a standalone service;

    # systemctl start rabbitmq-epmd
    [root@525856dd7915 system]# ps -ef
    UID          PID    PPID  C STIME TTY          TIME CMD
    root           1       0  0 05:59 ?        00:00:00 /sbin/init
    root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
    dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
    rabbitmq    7089       1  0 06:29 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
  2. rabbitmq is a separate service but since epmd(dependency) is started it is possible to run rabbitmq

    [root@525856dd7915 system]# systemctl start rabbitmq-server; ps -ef
    UID          PID    PPID  C STIME TTY          TIME CMD
    root           1       0  0 05:59 ?        00:00:00 /sbin/init
    root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
    dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
    rabbitmq    7089       1  0 06:29 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
    rabbitmq    7093       1 24 06:31 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51
    rabbitmq    7727    7093  0 06:31 ?        00:00:00 erl_child_setup 32768
    rabbitmq    7780    7727  0 06:31 ?        00:00:00 inet_gethost 4
    rabbitmq    7781    7780  0 06:31 ?        00:00:00 inet_gethost 4
  3. Suppose user for some reason forgot to start epmd:

    UID          PID    PPID  C STIME TTY          TIME CMD
    root           1       0  0 05:59 ?        00:00:00 /sbin/init
    root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
    dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

    When user tries to start rabbitmq-server, it will work(!) as rabbitmq-epmd is declared as an explicit dependency.

    [root@525856dd7915 system]# systemctl start rabbitmq-server; ps -ef
    UID          PID    PPID  C STIME TTY          TIME CMD
    root           1       0  0 05:59 ?        00:00:00 /sbin/init
    root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
    dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
    root         719       0  0 05:59 pts/1    00:00:00 /bin/bash
    rabbitmq    8026       1 12 06:34 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51
    rabbitmq    8234       1  0 06:34 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
    rabbitmq    8660    8026  0 06:34 ?        00:00:00 erl_child_setup 32768
    rabbitmq    8713    8660  0 06:34 ?        00:00:00 inet_gethost 4
    rabbitmq    8714    8713  0 06:34 ?        00:00:00 inet_gethost 4
    root        8723     719  0 06:34 pts/1    00:00:00 ps -ef

    Actually, this would work anyway as Erlang has the autostart epmd capability — I am just being explicit here.

  4. Cleaning up example:

    
    # Continued from 3...
    # since epmd is started as a dependency, when rabbitmq is stopped epmd is cleaned up as well

[root@525856dd7915 system]# systemctl stop rabbitmq-server [root@525856dd7915 system]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 05:59 ? 00:00:00 /sbin/init root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 719 0 0 05:59 pts/1 00:00:00 /bin/bash root 9094 719 0 06:37 pts/1 00:00:00 ps -ef

5. Cleaning up separate services. Supposed epmd and rabbitmq are started separately as in 2. rabbitmq can be stopped gracefully without impacting epmd.

initial state epmd and rabbitmq start separately

[root@525856dd7915 system]# systemctl start rabbitmq-epmd [root@525856dd7915 system]# systemctl start rabbitmq-server [root@525856dd7915 system]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 05:59 ? 00:00:00 /sbin/init root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 719 0 0 05:59 pts/1 00:00:00 /bin/bash rabbitmq 10962 1 0 06:46 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd rabbitmq 10965 1 44 06:46 ? 00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51 rabbitmq 11599 10965 0 06:46 ? 00:00:00 erl_child_setup 32768 rabbitmq 11652 11599 0 06:46 ? 00:00:00 inet_gethost 4 rabbitmq 11653 11652 0 06:46 ? 00:00:00 inet_gethost 4 root 11659 719 0 06:46 pts/1 00:00:00 ps -ef

stop rabbitmq-server; here epmd is unaffected

[root@525856dd7915 system]# systemctl stop rabbitmq-server; ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 05:59 ? 00:00:00 /sbin/init root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 719 0 0 05:59 pts/1 00:00:00 /bin/bash rabbitmq 10962 1 0 06:46 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd root 11827 719 0 06:47 pts/1 00:00:00 ps -ef


This makes me think of a separate question: do people ever run multiple instances of rabbitmq with a single instance of epmd? Then this suggestion might work well in that case.

hypothetical unit file rabbitmq-server@.service which templates

an instance of rabbitmq

systemctl start rabbitmq-epmd systemctl start rabbitmq-server@instance1 systemctl start rabbitmq-server@instance2

etc etc all sharing the single epmd process

lukebakken commented 5 years ago

Thanks for the explanations.

do people ever run multiple instances of rabbitmq with a single instance of epmd?

Only in development environments.

@michaelklishin @dumbbell this seems like a 4.0 feature, should we choose to undertake it.

dumbbell commented 4 years ago

From a systemd point of view, @space88man is right: epmd(1) should be managed separately because it requires privileges, can run from a user account, and open TCP ports which are different from and unrelated to RabbitMQ.

RabbitMQ's mission was never to manage epmd(1). We relied on the way Erlang works for a long time: the first Erlang node to start with or enable distribution implicitely starts epmd(1) if it's missing. Therefore that instance of epmd(1) inherits the user & environment of that Erlang node. If we take a host running both RabbitMQ and Ejabberd as an example, depending on the first service to start, epmd(1) will run under different conditions.

Anyway, as said above, RabbitMQ shouldn't do anything with epmd(1) management IMHO, this is out of scope. However, our Erlang RPM package can probably do something if that's the package in question.

For instance, the Erlang Debian package installs the following epmd.service file:

[Unit]
Description=Erlang Port Mapper Daemon
After=network.target
Requires=epmd.socket

[Service]
ExecStart=/usr/bin/epmd -systemd
Type=simple
StandardOutput=journal
StandardError=journal
User=epmd
Group=epmd

[Install]
Also=epmd.socket
WantedBy=multi-user.target

Would it help to do the same in our Erlang RPM package?