rsyslog / rsyslog

a Rocket-fast SYStem for LOG processing
http://www.rsyslog.com
GNU Lesser General Public License v3.0
2.05k stars 655 forks source link

journald file handle leak #4365

Closed kamaradclimber closed 4 years ago

kamaradclimber commented 4 years ago

Expected behavior

When journald rotate files, rsyslog should release handle on /var/log/journal files.

Actual behavior

After journald rotate files, rsyslog keeps handle on /var/log/journal files until it's restarted. Handles accumulate, partition gets eventually full.

Steps to reproduce the behavior

I've never seen this issue happening when we had default journald configuration on 20GB log partition. We've changed ~1 month ago the configuration to:

The above method to reproduce only systematically on our machines using that conf above but I can't reproduce it on machines with default configuration.

Environment

$ModLoad imuxsock $ModLoad imjournal

$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $RepeatedMsgReduction off $WorkDirectory /var/lib/rsyslog

$Umask 0022 $FileOwner root $FileGroup adm $FileCreateMode 0640 $DirOwner root $DirGroup adm $DirCreateMode 0755

$OmitLocalLogging on $IMJournalStateFile imjournal.state $imjournalRatelimitBurst 30000 $imjournalRatelimitInterval 60

$template RSYSLOG_SyslogProtocol23Format_FQDN,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% mesos-master06-par.central.criteo.preprod %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%"

[Many entries like the following block] $ActionQueueType LinkedList $ActionQueueSize 1000 $ActionQueueDequeueBatchSize 100 $ActionQueueTimeoutEnqueue 0 :programname, isequal, "yum" @@kestrel-infra-par.storage.criteo.preprod:12201;RSYSLOG_SyslogProtocol23Format_FQDN

$SystemLogSocketName /run/systemd/journal/socket $InputUnixListenSocketIgnoreMsgTimestamp off



### Related information

The issue may be the one described on https://bugzilla.redhat.com/show_bug.cgi?id=1713358 (which is marked as duplicated of another ticket with no public access).

Someone pointed out to me an issue with similar symptoms https://github.com/systemd/systemd/issues/14695  although it is likely unrelated.
For the record, we use stock systemd from centos7 (package version is 219.67.el7_7.2.x86_64)
kamaradclimber commented 4 years ago

More info, just in case it proves useful:

content of /var/lib/rsyslog/imjournal.state:

s=754cd751739840bb8404e57ecf77e154;i=c8348f8;b=73ece254bedf457aa321c188e52bb953;m=15c957969cb;t=5aad6ff0f0dc0;x=ce6fc58273bfe7ce

rsyslog is ran with /usr/sbin/rsyslogd -n as a systemd service.

There is no log in rsyslog output after startup beyond the usual starting line:

Jul 24 06:34:09 mesos-master06-par systemd[1]: Starting System Logging Service...
Jul 24 06:34:09 mesos-master06-par rsyslogd[4657]: [origin software="rsyslogd" swVersion="8.2006.0" x-pid="4657" x-info="https://www.rsyslog.com"] start
Jul 24 06:34:09 mesos-master06-par systemd[1]: Started System Logging Service.
kamaradclimber commented 4 years ago

The issue can still be reproduced with a more recent version of systemd (package from redhat 219-73.el7_8.5.x86_64) which is supposed to contain the fix for https://github.com/systemd/systemd/issues/14695.

rgerhards commented 4 years ago

@jvymazal can you have a look?

davidelang commented 4 years ago

is that a package created by redhat or one created by the rsyslog project?

we don't supply the packages to redhat and can't debug them.

David Lang

On Thu, 30 Jul 2020, Grégoire Seux wrote:

The issue can still be reproduced with a more recent version of systemd (package from redhat 219-73.el7_8.5.x86_64) which is supposed to contain the fix for https://github.com/systemd/systemd/issues/14695.

kamaradclimber commented 4 years ago

Both packages (systemd and rsyslog) are packages provided by centos.

How could we progress to understand more the issue at hand?

jvymazal commented 4 years ago

Hi, unfortunately this issue seems to have root cause on systemd (journal) side, not rsyslog, so not particularly anything I can provide fix for. Also, this maybe related to systemd/systemd/issues/14695, can you please verify that systemd version you are using has the fix mentioned there?

wfk-eu commented 4 years ago

We tested with CentOS 7.8.2003, which should, according to the release notes, include https://access.redhat.com/errata/RHBA-2018:0416, which looks very much like this issue. Unfortunately the RH bug doesn't have any reference to an upstream systemd github issue, so we can't know for sure that this is all dealing with the same issue. This update didn't actually fix our problem. We could reproduce the problem as indicated above. Will continue to investigate.

davidelang commented 4 years ago

you have two directions you can go

  1. contact RedHat for support

  2. upgrade to a community supported version (8.2006 is the current version) so that we can help you.

David Lang

On Thu, 30 Jul 2020, Grégoire Seux wrote:

Both packages (systemd and rsyslog) are packages provided by centos.

How could we progress to understand more the issue at hand?

kamaradclimber commented 4 years ago

Thanks for your answer. We've actually tested with 8.2006 and have the same issue.

davidelang commented 4 years ago

ok, that we can troubleshoot

does this only happen under high load? if not, you may be able to run it under valgrind to get a report on the memory leak.

but note the earlier comment that indicates this may be a journald issue

David Lang

On Fri, 31 Jul 2020, Grégoire Seux wrote:

Thanks for your answer. We've actually tested with 8.2006 and have the same issue.

rgerhards commented 4 years ago

closed because of https://github.com/rsyslog/rsyslog/issues/4365#issuecomment-666992296, as @jvymazal has the most in-depth insight into this. Follow advise from his post. Cannot be solved on rsyslog side.