processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6.01k stars 1.5k forks source link

Make systemd better aware of ejabberd process #2822

Open thegcat opened 5 years ago

thegcat commented 5 years ago

Currently systemd is not aware that beam.smp is the main process. Unfortunately beam.smp was killed by the OOM-killer on our system, which systemd did not register as the service failing.

The following additions to the [Service] section of the systemd unit file will:

# Make sure systemd knows when ejabberd crashes/is killed
RuntimeDirectory=ejabberd
Environment=EJABBERD_PID_PATH=/var/run/ejabberd/ejabberd.pid
PIDFile=/var/run/ejabberd/ejabberd.pid
weiss commented 5 years ago

The configure script would have to edit the preinstalled ejabberdctl.cfg and the systemd unit file to set the desired PID file path.

I think a better solution would be to add explicit systemd support and let beam.smp run in the foreground, and maybe leave such workarounds to distribution maintainers/admins in the meantime.

thegcat commented 5 years ago

This would work for me either way, the compelling argument for the proposed solution is that it works by just adapting the unit file, there is no other installation or configuration work needed. The paths used are those expected to work in a systemd environment as far as I know.

Native systemd support would be even better, the above solution is a simple and effective improvement wrt the current state that can be used until then.

weiss commented 5 years ago

The paths used are those expected to work in a systemd environment as far as I know.

The ./configure script supports custom installation paths, and many admins make use of that despite using systemd (including myself: I don't even use root privileges during installation). I wouldn't want to break that.

mtdcr commented 3 years ago

@thegcat Recently, there were changes to the service file (https://github.com/processone/ejabberd/pull/3429, https://github.com/processone/ejabberd/pull/3471). I suspect they fixed your issue. Can you confirm?

weiss commented 3 years ago

@mtdcr, thanks for brinding this up, but this specific issue actually isn't addressed yet: The ejabberdctl script still runs the VM as a child process, and systemd is not aware of this child being the main service process. Having the ejabberdctl script exec the VM instead would do the trick, and I have that on my list.

thegcat commented 3 years ago

We just updated and kill -9-ed the beam.smp process, essentially what the OOM-killer would do from my understanding, and systemd noticed it was down and restarted it. Regarding our specific problem from what I can see this solves the problem.

Excerpt from the logs regarding this test:

Feb 05 09:02:44 ejabberd ejabberdctl[21437]: [os_mon] memory supervisor port (memsup): Erlang has closed
Feb 05 09:02:44 ejabberd ejabberdctl[21262]: Killed
Feb 05 09:02:44 ejabberd systemd[1]: ejabberd.service: Main process exited, code=exited, status=137/n/a
Feb 05 09:02:44 ejabberd systemd[1]: ejabberd.service: Failed with result 'exit-code'.
Feb 05 09:02:49 ejabberd systemd[1]: ejabberd.service: Scheduled restart job, restart counter is at 1.
Feb 05 09:02:49 ejabberd systemd[1]: Stopped XMPP Server.
Feb 05 09:02:49 ejabberd systemd[1]: Starting XMPP Server...
weiss commented 3 years ago

Regarding our specific problem from what I can see this solves the problem.

The thing that's not yet solved is:

Currently systemd is not aware that beam.smp is the main process.

See:

# systemctl status ejabberd | fgrep 'Main PID'
 Main PID: 7489 (ejabberdctl)

However, you'll no longer run into your specific problem indeed, as the ejabberdctl script now exits if the beam.smp process is killed. So yes, maybe this issue can be closed.

thegcat commented 3 years ago

The thing that's not yet solved is:

Currently systemd is not aware that beam.smp is the main process.

Ah, we had worked around our problem previously by setting the following override.conf:

[Service]
# Make sure systemd knows when ejabberd crashes/is killed
RuntimeDirectory=ejabberd
Environment=EJABBERD_PID_PATH=/var/run/ejabberd/ejabberd.pid
PIDFile=/var/run/ejabberd/ejabberd.pid

From what I read in the docs I would have expected that this made systemd aware of beam.smp being the main process, but this doesn't seem to reflect in the Main PID of the status output.

prefiks commented 3 years ago

We probably could also exec erl when called with foreground options, i don't think we do any cleanup when erl finishes, and this will allow erlang to inherite ejabberdctl pid.

weiss commented 3 years ago

We probably could also exec erl

I think so, too. I just didn't dare to apply that change immediately before the 21.01 release :smile:

badlop commented 3 years ago

Ok, I played with that possibility in docker-ejabberd, and found that using exec breaks started), stopped), *) and mkdir in ejabberdctl, so those must continue using the old method.

The good news is that using exec allows docker to stop ejabberd properly, which was not possible before.

I learned all this the hard way ... https://github.com/processone/docker-ejabberd/commit/9adadc6999573b0ed383026fe3139a2f336b8329 https://github.com/processone/docker-ejabberd/commit/8c5e758191710e03499a3e2198134acb6ad6e71d https://github.com/processone/docker-ejabberd/commit/387254bcdf5ceb07d14c590f353ea8eb62e72aad

weiss commented 3 years ago

those must continue using the old method.

Yes, that's the reason this fix isn't completely trivial (sorry I didn't mention that explicitly).