Open space88man opened 5 years ago
FYI, most of the script is migrated to Erlang on the rabbitmq-server-script-replacement branch.
Considerations to take into account:
systemd
beam.smp
) and epmd
- they aren't separate packages.epmd
if it's not already started. If you pass -name
or -sname
to your erl
executable, it will start epmd
for you."Not all of our supported operating systems use systemd" - other process supervisors in containers can lose sight of epmd. Can it be started without the -daemon argument? (Systemd actually manages to capture it due to cgroups - at least on EL7 it manages to clean up epmd/beam.smp and the erl children inet_gethost processes).
I remember that in the EL6 days /etc/init.d/rabbitmq-server stop
didn't always clean up everything correctly.
@space88man let's keep this issue a little bit more focused. Given that RabbitMQ supports a variety of platforms that do not use systemd
and most of the scripts are moving to Erlang, what are some of the specific changes that you would like to see in the RPM package?
Moved to the packaging repo as it currently seems to fit best here.
rabbitmq-epmd.service:
[Unit]
Description=Erlang Port Mapper Daemon
After=syslog.target network.target
[Service]
User=rabbitmq
Group=rabbitmq
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/lib64/erlang/erts-10.5.3/bin/epmd
[Install]
WantedBy=rabbitmq.target
rabbitmq-server.service:
[Unit]
Description=RabbitMQ broker
After=rabbitmq-epmd.service
Requires=rabbitmq-epmd.service
[Service]
Type=notify
User=rabbitmq
Group=rabbitmq
UMask=0027
NotifyAccess=all
TimeoutStartSec=3600
LimitNOFILE=32768
Restart=on-failure
RestartSec=10
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/sbin/rabbitmq-server
ExecStop=/usr/sbin/rabbitmqctl shutdown
SuccessExitStatus=69
[Install]
WantedBy=rabbitmq.target
rabbitmq.target:
[Unit]
Description=RabbitMQ Broker Target
@space88man If you have a specific problem, or issue you have seen due to how RabbitMQ currently starts, that would be useful information for us.
@space88man sorry but we would not consider a change unless we understand it. Why should we adopt those unit files? What are the risks?
TL;DR: to work nicely with process supervisors (supervisord/s6 etc) don't launch epmd
with -daemon
.
@michaelklishin @lukebakken - This issue is intended to address process supervisors like supervisord, s6 which will be unable to manage epmd, given the way it is currently launched.
I'd like to clarify that the current rabbitmq:
-daemon
), works with other process supervisorsProcess supervisor problem with /usr/sbin/rabbitmq-server
@lukebakken Specific problem: epmd
is not correctly managed.
The key issue with supervisord/s6 etc is that "Programs meant to be run under supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they are started." (Taken from the supervisord docs.)
Supervisor: https://github.com/just-containers/s6-overlay. A service in s6 is just an executable supervised by a monitoring process s6-supervise
. Every service has its own long-running monitor that lies between it and PID 1, so the service is not intended to be a direct child of PID 1. The monitor never dies but the service main and child processes are expected to die when the service is down.
Configure a service rabbitmq
in s6 and give the launch script as /usr/sbin/rabbitmq-server
. (This example is in a container to remove all the noise from other OS processes.)
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 01:32 pts/0 00:00:00 s6-svscan -t0 /var/run/s6/services
root 27 1 0 01:32 pts/0 00:00:00 s6-supervise s6-fdholderd
root 2457 1 0 01:53 pts/0 00:00:00 s6-supervise rabbitmq
rabbitmq 2458 2457 0 01:53 ? 00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 2666 1 0 01:53 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd -daemon
rabbitmq 2773 2458 34 01:53 ? 00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlm
rabbitmq 3093 2773 1 01:53 ? 00:00:00 erl_child_setup 1048576
rabbitmq 3147 3093 0 01:53 ? 00:00:00 inet_gethost 4
rabbitmq 3148 3147 0 01:53 ? 00:00:00 inet_gethost 4
Observations:
s6-supervise 2457
except epmd
which escapes and goes to PID 1s6-svc -d /run/s6/services/rabbitmq
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 01:32 pts/0 00:00:00 s6-svscan -t0 /var/run/s6/services
root 27 1 0 01:32 pts/0 00:00:00 s6-supervise s6-fdholderd
root 2457 1 0 01:53 pts/0 00:00:00 s6-supervise rabbitmq
rabbitmq 2666 1 0 01:53 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd -daemon
Problem: epmd survives as it was reparented to PID 1 by -daemon
.
Simplest solution
Is there a way to launch epmd
without the -daemon
option from /usr/sbin/rabbitmq-server
or /usr/lib/rabbitmq/bin/rabbitmq-server
?
Two service solution
Configure a separate service for epmd (without -daemon
). The process tree looks like this:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 01:32 pts/0 00:00:00 s6-svscan -t0 /var/run/s6/services
root 27 1 0 01:32 pts/0 00:00:00 s6-supervise s6-fdholderd
root 2457 1 0 01:53 pts/0 00:00:00 s6-supervise rabbitmq
root 3538 1 0 02:23 pts/0 00:00:00 s6-supervise epmd
rabbitmq 3558 3538 0 02:23 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq 3561 2457 1 02:23 ? 00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 3876 3561 67 02:23 ? 00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlm
rabbitmq 4196 3876 2 02:23 ? 00:00:00 erl_child_setup 1048576
rabbitmq 4250 4196 0 02:23 ? 00:00:00 inet_gethost 4
rabbitmq 4251 4250 0 02:23 ? 00:00:00 inet_gethost 4
Notice epmd is contained under s6-supervise 3538
. Both services can be stopped cleanly.
# try to stop services cleanly. In s6 lingo, the commands are:
# s6-svc -d /run/s6/services/rabbitmq
# s6-svc -d /run/s6/services/epmd
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 01:32 pts/0 00:00:00 s6-svscan -t0 /var/run/s6/services
root 27 1 0 01:32 pts/0 00:00:00 s6-supervise s6-fdholderd
root 2457 1 0 01:53 pts/0 00:00:00 s6-supervise rabbitmq
root 3538 1 0 02:23 pts/0 00:00:00 s6-supervise epmd
Of course for this to work properly s6 (and any other process supervisor/service management) would have to be using service dependency and declare that the rabbitmq service depends on the epmd service.
In my previous post, I used systemd as it was easiest to demonstrate the dependency relationship that rabbitmq-server.service(beam.smp) depends on rabbitmq-epmd.service(epmd) and must be started after it. @michaelklishin there is no risk: this is merely an explicit declaration that the epmd process must be started first.
@michaelklishin For RPM/systemd based systems, let me try to show the intention of the two service proposal.
epmd is a standalone service;
# systemctl start rabbitmq-epmd
[root@525856dd7915 system]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:59 ? 00:00:00 /sbin/init
root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald
dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
rabbitmq 7089 1 0 06:29 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq is a separate service but since epmd(dependency) is started it is possible to run rabbitmq
[root@525856dd7915 system]# systemctl start rabbitmq-server; ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:59 ? 00:00:00 /sbin/init
root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald
dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
rabbitmq 7089 1 0 06:29 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq 7093 1 24 06:31 ? 00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51
rabbitmq 7727 7093 0 06:31 ? 00:00:00 erl_child_setup 32768
rabbitmq 7780 7727 0 06:31 ? 00:00:00 inet_gethost 4
rabbitmq 7781 7780 0 06:31 ? 00:00:00 inet_gethost 4
Suppose user for some reason forgot to start epmd:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:59 ? 00:00:00 /sbin/init
root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald
dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
When user tries to start rabbitmq-server, it will work(!) as rabbitmq-epmd is declared as an explicit dependency.
[root@525856dd7915 system]# systemctl start rabbitmq-server; ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:59 ? 00:00:00 /sbin/init
root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald
dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 719 0 0 05:59 pts/1 00:00:00 /bin/bash
rabbitmq 8026 1 12 06:34 ? 00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51
rabbitmq 8234 1 0 06:34 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq 8660 8026 0 06:34 ? 00:00:00 erl_child_setup 32768
rabbitmq 8713 8660 0 06:34 ? 00:00:00 inet_gethost 4
rabbitmq 8714 8713 0 06:34 ? 00:00:00 inet_gethost 4
root 8723 719 0 06:34 pts/1 00:00:00 ps -ef
Actually, this would work anyway as Erlang has the autostart epmd capability — I am just being explicit here.
Cleaning up example:
# Continued from 3...
# since epmd is started as a dependency, when rabbitmq is stopped epmd is cleaned up as well
[root@525856dd7915 system]# systemctl stop rabbitmq-server [root@525856dd7915 system]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 05:59 ? 00:00:00 /sbin/init root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 719 0 0 05:59 pts/1 00:00:00 /bin/bash root 9094 719 0 06:37 pts/1 00:00:00 ps -ef
5. Cleaning up separate services. Supposed epmd and rabbitmq are started separately as in 2. rabbitmq can be stopped gracefully without impacting epmd.
[root@525856dd7915 system]# systemctl start rabbitmq-epmd [root@525856dd7915 system]# systemctl start rabbitmq-server [root@525856dd7915 system]# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 05:59 ? 00:00:00 /sbin/init root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 719 0 0 05:59 pts/1 00:00:00 /bin/bash rabbitmq 10962 1 0 06:46 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd rabbitmq 10965 1 44 06:46 ? 00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51 rabbitmq 11599 10965 0 06:46 ? 00:00:00 erl_child_setup 32768 rabbitmq 11652 11599 0 06:46 ? 00:00:00 inet_gethost 4 rabbitmq 11653 11652 0 06:46 ? 00:00:00 inet_gethost 4 root 11659 719 0 06:46 pts/1 00:00:00 ps -ef
[root@525856dd7915 system]# systemctl stop rabbitmq-server; ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 05:59 ? 00:00:00 /sbin/init root 16 1 0 05:59 ? 00:00:00 /usr/lib/systemd/systemd-journald dbus 23 1 0 05:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only root 719 0 0 05:59 pts/1 00:00:00 /bin/bash rabbitmq 10962 1 0 06:46 ? 00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd root 11827 719 0 06:47 pts/1 00:00:00 ps -ef
This makes me think of a separate question: do people ever run multiple instances of rabbitmq with a single instance of epmd? Then this suggestion might work well in that case.
systemctl start rabbitmq-epmd systemctl start rabbitmq-server@instance1 systemctl start rabbitmq-server@instance2
Thanks for the explanations.
do people ever run multiple instances of rabbitmq with a single instance of epmd?
Only in development environments.
@michaelklishin @dumbbell this seems like a 4.0
feature, should we choose to undertake it.
From a systemd point of view, @space88man is right: epmd(1) should be managed separately because it requires privileges, can run from a user account, and open TCP ports which are different from and unrelated to RabbitMQ.
RabbitMQ's mission was never to manage epmd(1). We relied on the way Erlang works for a long time: the first Erlang node to start with or enable distribution implicitely starts epmd(1) if it's missing. Therefore that instance of epmd(1) inherits the user & environment of that Erlang node. If we take a host running both RabbitMQ and Ejabberd as an example, depending on the first service to start, epmd(1) will run under different conditions.
Anyway, as said above, RabbitMQ shouldn't do anything with epmd(1) management IMHO, this is out of scope. However, our Erlang RPM package can probably do something if that's the package in question.
For instance, the Erlang Debian package installs the following epmd.service
file:
[Unit]
Description=Erlang Port Mapper Daemon
After=network.target
Requires=epmd.socket
[Service]
ExecStart=/usr/bin/epmd -systemd
Type=simple
StandardOutput=journal
StandardError=journal
User=epmd
Group=epmd
[Install]
Also=epmd.socket
WantedBy=multi-user.target
Would it help to do the same in our Erlang RPM package?
The rabbitmq-server launch script runs multiple downstream scripts to start epmd and beam as long-running processes. This goes against modern process supervision that want epmd in the foreground and have beam and epmd as two separate services.
epmd -daemon enables epmd to escape process supervision suites that do not capture /usr/sbin/rabbitmq-server in a cgroup.
Observations:
Barely manages to tame epmd -daemon, but only because of cgroups.
Suggestions: