Open karniemi opened 7 years ago
@karniemi thanks for taking the time to fill out this issue in very good detail.
The logs being put on the host from the containers was indeed a design decision that was requested by others. You're right about the switch not yet being in place on RHEL. However the switch was put into play primarily for Atomic hosts as the file "/usr/libexec/oci/hooks.d/oci-systemd-hook" can not be removed in that environment. On RHEL you should be able to simply remove that file or move it to another directory and that will also turn off oci-systemd-hook for you too.
Hope that helps!
@TomSweeneyRedHat thanks for that possible workaround. It might serve someone who's having a critical issue with the logs piling up. Personally, I feel uneasy about removing files installed by rpms, and can live with cleaning up the host journal manually until there's a proper fix for this issue.
As far as I understand, there are two issues:
(That said, I now also understand and appreciate the value of oci-systemd-hook for those who want the containers' journal to the host - automatically and without the loose privileges that were earlier required without oci-systemd-hook . :-) )
The big difference between setting up a container runtime to log, is that they are only logging messages from stdout/stderr of the primary PID of the container. All messages that are written to /dev/log or directly to the journal are dropped. If you run systemd as PID1 inside of the container then these messages can be caught and recorded on the system.
We have to figure out the best way to handle these logs that can get left behind when the container is removed.
Opened a bugzilla on this issue.
nspawn has:
--link-journal=
Control whether the container's journal shall be made visible to the host system. If enabled, allows viewing the container's journal files from the host (but not vice versa). Takes one of "no", "host", "try-host", "guest", "try-guest", "auto". If "no", the journal is not linked. If "host", the journal files are stored on the host file system (beneath /var/log/journal/machine-id) and the subdirectory is bind-mounted into the container at the same location. If "guest", the journal files are stored on the guest file system (beneath /var/log/journal/machine-id) and the subdirectory is symlinked into the host at the same location. "try-host" and "try-guest" do the same but do not fail if the host does not have persistent journaling enabled. If "auto" (the default), and the right subdirectory of /var/log/journal exists, it will be bind mounted into the container. If the subdirectory does not exist, no linking is performed. Effectively, booting a container once with "guest" or "host" will link the journal persistently if further on the default of "auto" is used.
Note that --link-journal=try-guest is the default if the systemd-nspawn@.service template unit file is used.
I think the persistent journaling should be opt-in. Particularly given that Docker is commonly driven by e.g. Kubernetes which is really all about transient and not "pet/elephant" containers.
However the switch was put into play primarily for Atomic hosts as the file "/usr/libexec/oci/hooks.d/oci-systemd-hook" can not be removed in that environment. On RHEL you should be able to simply remove that file or move it to another directory and that will also turn off oci-systemd-hook for you too.
BTW, I'm really trying to get people away from "you can't" for Atomic Host. In fact, you can - just ostree admin unlock
. That's very explicitly transient for testing though - because removing it on a yum-based system also isn't reliably persistent - a yum update
pulling in a new version of oci-systemd-hook.rpm
will happily reinstate that file. Atomic Host/ostree is enforcing best practice, not about restricting users.
For a long time I was looking at /var/log/journald growing in size beyond limits, with new sub-directories piling in that directory...
Found the reason: it's the docker-containers that are using systemd, together with RHEL dockerd and oci-systemd-hook. In CI test rounds we are running and killings tens of docker containers per day, and the killed docker-containers are leaving their journal logs hanging in /var/log/journald/ of the host system. What's worse, the host systems journal log system does not seem to rotate these hanging logs -nor does it count them when using "journalctl --disk-usage". The contradicting disk usage reported "du"-command and "journalctl --disk-usage" was making me mad as well(and maybe would be worth another issue report). Anyway, the killed docker-containers leave "/var/log/journald//system.journal"-named files on the host, and probably the journal log rotation rules consider them as open, so it does not clean them...which causes the host journal to grow beyond the configured limits.
Versions: oci-systemd-hook 1:0.1.8-4.1.gite533efa.el7 which was brought in as a dependency of docker-1.12.6-48.git0fdc778-el7
(I'm not quite sure if it's really been a wise decision to automatically "leak" journal logs from containers to host. Yes, it's a design decision from Red Hat -and I understand it is intentional. Still, containers were supposed to be isolated environments - so at least I would have preferred keeping the default as not to leak the journal logs(nor anything else) from containers to the host. Why make journal a special case for the simple host-container-isolation-paradigma?)
Unfortunately, the version of oci-systemd-hook shipped by Red Hat does not support '--env oci-systemd-hook=disabled' ... so the only workaround is to periodically clean up the /var/log/journal manually?