journald file handle leak

kamaradclimber commented 4 years ago

Expected behavior

When journald rotate files, rsyslog should release handle on /var/log/journal files.

Actual behavior

After journald rotate files, rsyslog keeps handle on /var/log/journal files until it's restarted. Handles accumulate, partition gets eventually full.

Steps to reproduce the behavior

Run while true; do cat <file with many lines> | logger; done
Wait a while (for journald rotation)

sudo lsof -p $(pgrep rsyslogd) | grep deleted and look at an ever growing list of

739840bb8404e57ecf77e154-0000000000000001-0005aa9a358ffe19.journal (deleted)
rsyslogd 29295 root 2577r   REG                8,3 134217728  34878240 /var/log/journal/1a8f8e1fb3fc4ff7bd21d5a433581369/system@754cd751739840bb8404e57ecf77e154-000000000c21299c-0005aa9a3594d6fe.journal (deleted)
rsyslogd 29295 root 2578r   REG                8,3  67108864  34878241 /var/log/journal/1a8f8e1fb3fc4ff7bd21d5a433581369/user-1000@ac9c7e033a4e46cea922e5d5785be838-000000000c21298c-0005aa9a3594caf1.journal (deleted)
rsyslogd 29295 root 2579r   REG                8,3 125829120  34878242 /var/log/journal/1a8f8e1fb3fc4ff7bd21d5a433581369/system@754cd751739840bb8404e57ecf77e154-000000000c251e7b-0005aa9df4fa406b.journal (deleted)
rsyslogd 29295 root 2580r   REG                8,3   8388608  34878243 /var/log/journal/1a8f8e1fb3fc4ff7bd21d5a433581369/user-1000@ac9c7e033a4e46cea922e5d5785be838-000000000c251e88-0005aa9df51e0935.journal (deleted)
rsyslogd 29295 root 2581r   REG                8,3 125829120  34878244 /var/log/journal/1a8f8e1fb3fc4ff7bd21d5a433581369/system@754cd751739840bb8404e57ecf77e154-000000000c271a4f-0005aaa05f8a6c8e.journal (deleted)
rsyslogd 29295 root 2582r   REG                8,3   8388608  34878245 /var/log/journal/1a8f8e1fb3fc4ff7bd21d5a433581369/user-1000@ac9c7e033a4e46cea922e5d5785be838-000000000c271a98-0005aaa06015a140.journal (deleted)
rsyslogd 29295 root 2583r   REG                8,3 125829120  34878247 /var/log/journal/1a8f8e1fb3fc4ff7bd21d5a433581369/system@754cd751739840bb8404e57ecf77e154-000000000c28fee1-0005aaa2782bd63a.journal (deleted)

(I've seen up to 500 handles but it stops when partition is full).

I've never seen this issue happening when we had default journald configuration on 20GB log partition. We've changed ~1 month ago the configuration to:

75GB log partition
SystemMaxUse = 68G in /etc/systemd/journald.conf and this issue has been detected starting yesterday (1 month after partition resize) and happens all the time in the last 24h (we have an unusual peak of logs sent to journald and thus to rsyslog).

The above method to reproduce only systematically on our machines using that conf above but I can't reproduce it on machines with default configuration.

Environment

rsyslog version: 8.1911 & 8.2006
platform: centos7
configuration:
```
$MaxMessageSize 32k
$PreserveFQDN off
```

$ModLoad imuxsock $ModLoad imjournal

$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $RepeatedMsgReduction off $WorkDirectory /var/lib/rsyslog

$Umask 0022 $FileOwner root $FileGroup adm $FileCreateMode 0640 $DirOwner root $DirGroup adm $DirCreateMode 0755

$OmitLocalLogging on $IMJournalStateFile imjournal.state $imjournalRatelimitBurst 30000 $imjournalRatelimitInterval 60

$template RSYSLOG_SyslogProtocol23Format_FQDN,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% mesos-master06-par.central.criteo.preprod %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%"

[Many entries like the following block] $ActionQueueType LinkedList $ActionQueueSize 1000 $ActionQueueDequeueBatchSize 100 $ActionQueueTimeoutEnqueue 0 :programname, isequal, "yum" @@kestrel-infra-par.storage.criteo.preprod:12201;RSYSLOG_SyslogProtocol23Format_FQDN

$SystemLogSocketName /run/systemd/journal/socket $InputUnixListenSocketIgnoreMsgTimestamp off



### Related information

The issue may be the one described on https://bugzilla.redhat.com/show_bug.cgi?id=1713358 (which is marked as duplicated of another ticket with no public access).

Someone pointed out to me an issue with similar symptoms https://github.com/systemd/systemd/issues/14695  although it is likely unrelated.
For the record, we use stock systemd from centos7 (package version is 219.67.el7_7.2.x86_64)

kamaradclimber commented 4 years ago

More info, just in case it proves useful:

content of /var/lib/rsyslog/imjournal.state:

s=754cd751739840bb8404e57ecf77e154;i=c8348f8;b=73ece254bedf457aa321c188e52bb953;m=15c957969cb;t=5aad6ff0f0dc0;x=ce6fc58273bfe7ce

rsyslog is ran with /usr/sbin/rsyslogd -n as a systemd service.

There is no log in rsyslog output after startup beyond the usual starting line:

Jul 24 06:34:09 mesos-master06-par systemd[1]: Starting System Logging Service...
Jul 24 06:34:09 mesos-master06-par rsyslogd[4657]: [origin software="rsyslogd" swVersion="8.2006.0" x-pid="4657" x-info="https://www.rsyslog.com"] start
Jul 24 06:34:09 mesos-master06-par systemd[1]: Started System Logging Service.

kamaradclimber commented 4 years ago

The issue can still be reproduced with a more recent version of systemd (package from redhat 219-73.el7_8.5.x86_64) which is supposed to contain the fix for https://github.com/systemd/systemd/issues/14695.

rgerhards commented 4 years ago

@jvymazal can you have a look?

davidelang commented 4 years ago

is that a package created by redhat or one created by the rsyslog project?

we don't supply the packages to redhat and can't debug them.

David Lang

On Thu, 30 Jul 2020, Grégoire Seux wrote:

The issue can still be reproduced with a more recent version of systemd (package from redhat 219-73.el7_8.5.x86_64) which is supposed to contain the fix for https://github.com/systemd/systemd/issues/14695.

kamaradclimber commented 4 years ago

Both packages (systemd and rsyslog) are packages provided by centos.

How could we progress to understand more the issue at hand?

jvymazal commented 4 years ago

Hi, unfortunately this issue seems to have root cause on systemd (journal) side, not rsyslog, so not particularly anything I can provide fix for. Also, this maybe related to systemd/systemd/issues/14695, can you please verify that systemd version you are using has the fix mentioned there?

wfk-eu commented 4 years ago

We tested with CentOS 7.8.2003, which should, according to the release notes, include https://access.redhat.com/errata/RHBA-2018:0416, which looks very much like this issue. Unfortunately the RH bug doesn't have any reference to an upstream systemd github issue, so we can't know for sure that this is all dealing with the same issue. This update didn't actually fix our problem. We could reproduce the problem as indicated above. Will continue to investigate.

davidelang commented 4 years ago

you have two directions you can go

contact RedHat for support
upgrade to a community supported version (8.2006 is the current version) so that we can help you.

David Lang

On Thu, 30 Jul 2020, Grégoire Seux wrote:

Both packages (systemd and rsyslog) are packages provided by centos.

How could we progress to understand more the issue at hand?

kamaradclimber commented 4 years ago

Thanks for your answer. We've actually tested with 8.2006 and have the same issue.

davidelang commented 4 years ago

ok, that we can troubleshoot

does this only happen under high load? if not, you may be able to run it under valgrind to get a report on the memory leak.

but note the earlier comment that indicates this may be a journald issue

David Lang

On Fri, 31 Jul 2020, Grégoire Seux wrote:

Thanks for your answer. We've actually tested with 8.2006 and have the same issue.

rgerhards commented 4 years ago

closed because of https://github.com/rsyslog/rsyslog/issues/4365#issuecomment-666992296, as @jvymazal has the most in-depth insight into this. Follow advise from his post. Cannot be solved on rsyslog side.

rsyslog / rsyslog