Attempt to avoid leaking or holding open descriptors for deleted journal files

nalind commented 6 years ago

- What I did This is taking a stab at fixing the open-file-handles-for-deleted-journal-files bug at https://bugzilla.redhat.com/show_bug.cgi?id=1534827.

- How I did it We weren't handling SD_JOURNAL_INVALIDATE results coming back from sd_journal_process() before, so I added logic to close and reopen the journal handle when we get that result.

- How to verify it Run docker logs -f to tail the journaled output from a container that was started with --log-driver journald, and consult the list of open descriptors in the directory under /proc/PID/fd which corresponds to the docker process to see which journal files it has open. While docker logs -f is running, use journalctl --vacuum-size or journalctl --vacuum-files to force the removal of one or more older journal files which dockerd had open, then consult its fd directory again: it should no longer have open descriptors for files which were removed (i.e., the kernel should not be noting any journal files which it has open as (deleted)).

- Description for the changelog

When following a journal in the journald log reader, correctly handle SD_JOURNAL_INVALIDATE results from sd_journal_process() by closing and reopening the journal handle, which closes all open journal files (including deleted ones) and opening all currently-available journal files.

nalind commented 6 years ago

Integration tests are failing because the build uses golang 1.6.4, but the "vendored" archive/tar package references constant io.SeekCurrent, which wasn't added until go 1.7. According to the docs, replacing it with os.SEEK_CURRENT should work.

portante commented 6 years ago

See also: https://github.com/systemd/systemd/issues/7998, which highlights that a journalctl -f appears to use an internal mechanism to close open file descriptors on rotation instead of the proper sd_journal_close()/sd_journal_open() dance.

portante commented 6 years ago

This gist appears to be a working example of how to tail the journal without leaking file descriptors or creating duplicates: https://gist.github.com/portante/ff7fb429c6f973aab377f7bb77b0ffdb

nalind commented 6 years ago

Second patch added to check that we're not filling up the logWatcher.Msg channel, which would cause the writing goroutine to block indefinitely, making it so that it never closes its handle. This can happen when the client starts docker logs -f and then closes its connection.

The pathological test case is to disable rate limiting by setting RateLimitBurst to 0 and SystemMaxFileSize to 10M in /etc/systemd/journald.conf, ensuring that a log-heavy container will cause rapid and frequent creation of new files to hold journal entries. Starting a container to run seq 10000000 will then produce a test case which should help to reproduce the problems we're attempting to solve.

nalind commented 6 years ago

Based on advice from other maintainers, it's actually looking like we don't need to handle SD_JOURNAL_INVALIDATE in order to close handles to deleted journal files, as sd_journal_process() does that automatically. In that case, they'd only be held open while we're reading them in between calls to sd_journal_process(), which can be a while if there's a lot of logged output for us to churn through when we're reading the logs in either non-follow mode, or during the "catch up" phase in follow mode. So the first patch in the series may not be providing any benefit here.

nalind commented 6 years ago

@portante It shouldn't be necessary if sd_journal_process() closes only the ones that are gone and only opens the new ones, and it should be a little bit faster to avoid reopening in INVALIDATE cases, yeah.

portante commented 6 years ago

FWIW, here is what we proposed for rsyslog client: https://github.com/rsyslog/rsyslog/pull/2437

portante commented 6 years ago

Based on https://github.com/systemd/systemd/pull/8032/files#diff-787108d33a7a2bd28d22e0a606901037R2651, if somebody does any one of:

$ docker logs 997acb8 | less
$ docker logs -f 87ab356 | less

Where the log output pauses for a "long time", won't that prevent both the sd_journal_wait() and sd_journal_process() calls from occurring? If so, I think we can still leak file descriptors.

As such, I believe we need a thread that periodically listens for inotify events and calls sd_journal_process().

nalind commented 6 years ago

Alright, just updated to have drainJournal always call sd_journal_process(), and to be more careful about making sure that cursor values it returns always point to the last journal entry that it sent, which should fix the occasional single-message-drop I'd been seeing.

nalind commented 6 years ago

Rebased.

rhatdan commented 6 years ago

LGTM

projectatomic / docker

Attempt to avoid leaking or holding open descriptors for deleted journal files #296