Closed mpkut closed 5 years ago
The unit files shipped for Debian 10 work without problems. How do the unit files for RedHat look like?
Hello Racke,
Based on your comment I took got some time to test a few more environments. Which of course is always a good idea and could have been done before submitting this issue.
These environments run Sympa as expected for me:
Fedora 30 desktop with docker-1.13.1-68.git47e2230.fc30.x86_64: systemd-241-12.git1e19bcd.fc30.x86_64 in CentOS 7 container CentOS 7 Vagrant VM with docker-1.13.1-103.git7f2769b.el7.centos.x86_64 systemd-219-67.el7_7.1.x86_64 in CentOS 7 container CentOS 7 Vagrant VM (bare OS) with systemd-219-67.el7_7.1.x86_64 in CentOS 7 container
These environments exhibit the issue:
CentOS 7.7 Vagrant VM with docker-ce-19.03.3-3.el7.x86_64, systemd-219-67.el7_7.1.x86_64 in container RHEL 7.7 Vagrant VM with docker-ce-19.03.3-3.el7.x86_64, systemd-219-67.el7_7.1.x86_64
This seems to narrow the behavior down to Docker CE and RHEL/CentOS 7 with systemd 219 or later.
I found some threads about the "Main PID does not belong to service" error, indicating that the error can happen when cgroups don't match between the service and the child process. And I do see a difference between the service's cgroup and that of the Sympa process in the failed environments.
In the Fedora 30 environment running docker-1.13, the cgroup of the sympa service and the sympa process are the same:
[root@sympa /]# systemctl status sympa|grep -i cgroup
CGroup: /system.slice/docker-4f0e7de74938784c9c8f3cc8f48f2a2dec4f469e7729a378a7b107c072b6d279.scope/system.slice/sympa.service
[root@sympa /]# grep name= /proc/382/cgroup
1:name=systemd:/system.slice/docker-4f0e7de74938784c9c8f3cc8f48f2a2dec4f469e7729a378a7b107c072b6d279.scope/system.slice/sympa.service
In contrast, on a CentOS 7.7 VM running docker-ce-19.03.3 the cgroup of the sympa process contains an extra instance of the string /docker/HASH
. This data was gathered during the timeout period where the service has not yet failed.
[root@sympa /]# grep name= /proc/93/cgroup
1:name=systemd:/docker/d29ecb85c7fc8a7aee70a8e592fdce1fb85436fca091456ccf430062b430d19e/docker/d29ecb85c7fc8a7aee70a8e592fdce1fb85436fca091456ccf430062b430d19e/system.slice/sympa.service
[root@sympa /]# systemctl status sympa |grep -i cgroup
CGroup: /docker/d29ecb85c7fc8a7aee70a8e592fdce1fb85436fca091456ccf430062b430d19e/system.slice/sympa.service
It seems that the disparate cgroups are the problem here, which makes it hard for me to argue that this is a Sympa bug. Feel free to close this issue if you agree. And thanks as always for reviewing this report.
-mpk
Yeah, I don't think that's a bug in the Sympa unit files. Thanks for your tests!
One comment before closing in the (relatively unlikely) case that someone else finds their way here.
The Fedora/Red Hat packaging of Docker includes the oci-systemd-hook
package, which handles proper mounting of the cgroup file system when using systemd. Docker CE does not, which explains our results exactly.
Although things have moved on in the past few years, this Fedora Atomic working group discussion has some background on the reason for including these hooks, plus some details on running containers without them: https://pagure.io/atomic-wg/issue/233
In my case, when I added a bind mount of /sys/fs/cgroup
while creating the container, the sympa
service started up correctly.
docker create --privileged --hostname sympa.example.com --name sympa.example.com -v /sys/fs/cgroup:/sys/fs/cgroup sympa-systemd /usr/sbin/init
In summary: not a Sympa bug, nor a Docker bug. We were using Docker CE to run a systemd-based container without all of the necessary configuration that the new version of systemd requires.
Sympa daemons fail to activate upon system startup in (at minimum) CentOS 7 and RHEL 7 Docker containers running systemd-219-67 and up. The system log contains errors of the form
Systems running systemd-219-62 do not exhibit this behavior and Sympa runs as normal.
Retaining systemd-219-67 and removing the
PIDFile
directives from the systemd service files also results in daemons starting up. However this may be an incorrect configuration, if the PID file is essential to proper operation of Sympa under systemd.Version
Sympa 6.2.44 built for EPEL 7
Name : sympa Version : 6.2.44 Release : 3.el7 Architecture: x86_64 Install Date: Mon 14 Oct 2019 11:24:32 AM CDT Group : Unspecified Size : 15819788 License : GPLv2+ and (OFL and MIT) and OFL and MIT and (MIT or GPLv2) Signature : RSA/SHA256, Fri 19 Jul 2019 05:11:28 AM CDT, Key ID 6a2faea2352c64e5 Source RPM : sympa-6.2.44-3.el7.src.rpm Build Date : Fri 19 Jul 2019 04:06:51 AM CDT Build Host : buildvm-29.phx2.fedoraproject.org
Installation method
This method was tested using Docker 19.03 in a RHEL 7 VM
Create a Docker image with the attached Dockerfile and sympa.conf Dockerfile.txt sympa.conf.txt
Create a privileged container with systemd as the init program
Start the container and create a bash shell
Confirm the "New main PID does not belong to service" messages
Expected behavior
The Sympa daemons should start up and continue to run
Actual behavior
Systemd stops the daemons with an error indicating that the PID file is not owned by root.
Additional information
Downgrading to
systemd-219-67
offers the same result of daemons not starting properly.Earlier CentOS containers with
systemd-219-62
show the previous, expected behavior.This change tracks with Red Hat Security Advisory 2019:2091's release of systemd 219-67 to address CVE-2018-16888: https://access.redhat.com/errata/RHSA-2019:2091 https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2018-16888
Although this report is based on CentOS/RHEL, systemd-based packaging of Sympa for other distributions may also be affected.
As noted above, the systemd unit files may be able to operate without a
PIDFile
directive based on some trivial testing, but I do not have the expertise to evaluate whether that is the actual solution.