Closed kamaradclimber closed 4 years ago
More info, just in case it proves useful:
content of /var/lib/rsyslog/imjournal.state:
s=754cd751739840bb8404e57ecf77e154;i=c8348f8;b=73ece254bedf457aa321c188e52bb953;m=15c957969cb;t=5aad6ff0f0dc0;x=ce6fc58273bfe7ce
rsyslog is ran with /usr/sbin/rsyslogd -n
as a systemd service.
There is no log in rsyslog output after startup beyond the usual starting line:
Jul 24 06:34:09 mesos-master06-par systemd[1]: Starting System Logging Service...
Jul 24 06:34:09 mesos-master06-par rsyslogd[4657]: [origin software="rsyslogd" swVersion="8.2006.0" x-pid="4657" x-info="https://www.rsyslog.com"] start
Jul 24 06:34:09 mesos-master06-par systemd[1]: Started System Logging Service.
The issue can still be reproduced with a more recent version of systemd (package from redhat 219-73.el7_8.5.x86_64) which is supposed to contain the fix for https://github.com/systemd/systemd/issues/14695.
@jvymazal can you have a look?
is that a package created by redhat or one created by the rsyslog project?
we don't supply the packages to redhat and can't debug them.
David Lang
On Thu, 30 Jul 2020, Grégoire Seux wrote:
The issue can still be reproduced with a more recent version of systemd (package from redhat 219-73.el7_8.5.x86_64) which is supposed to contain the fix for https://github.com/systemd/systemd/issues/14695.
Both packages (systemd and rsyslog) are packages provided by centos.
How could we progress to understand more the issue at hand?
Hi, unfortunately this issue seems to have root cause on systemd (journal) side, not rsyslog, so not particularly anything I can provide fix for. Also, this maybe related to systemd/systemd/issues/14695, can you please verify that systemd version you are using has the fix mentioned there?
We tested with CentOS 7.8.2003, which should, according to the release notes, include https://access.redhat.com/errata/RHBA-2018:0416, which looks very much like this issue. Unfortunately the RH bug doesn't have any reference to an upstream systemd github issue, so we can't know for sure that this is all dealing with the same issue. This update didn't actually fix our problem. We could reproduce the problem as indicated above. Will continue to investigate.
you have two directions you can go
contact RedHat for support
upgrade to a community supported version (8.2006 is the current version) so that we can help you.
David Lang
On Thu, 30 Jul 2020, Grégoire Seux wrote:
Both packages (systemd and rsyslog) are packages provided by centos.
How could we progress to understand more the issue at hand?
Thanks for your answer. We've actually tested with 8.2006 and have the same issue.
ok, that we can troubleshoot
does this only happen under high load? if not, you may be able to run it under valgrind to get a report on the memory leak.
but note the earlier comment that indicates this may be a journald issue
David Lang
On Fri, 31 Jul 2020, Grégoire Seux wrote:
Thanks for your answer. We've actually tested with 8.2006 and have the same issue.
closed because of https://github.com/rsyslog/rsyslog/issues/4365#issuecomment-666992296, as @jvymazal has the most in-depth insight into this. Follow advise from his post. Cannot be solved on rsyslog side.
Expected behavior
When journald rotate files, rsyslog should release handle on /var/log/journal files.
Actual behavior
After journald rotate files, rsyslog keeps handle on /var/log/journal files until it's restarted. Handles accumulate, partition gets eventually full.
Steps to reproduce the behavior
while true; do cat <file with many lines> | logger; done
sudo lsof -p $(pgrep rsyslogd) | grep deleted
and look at an ever growing list of(I've seen up to 500 handles but it stops when partition is full).
I've never seen this issue happening when we had default journald configuration on 20GB log partition. We've changed ~1 month ago the configuration to:
SystemMaxUse = 68G
in /etc/systemd/journald.conf and this issue has been detected starting yesterday (1 month after partition resize) and happens all the time in the last 24h (we have an unusual peak of logs sent to journald and thus to rsyslog).The above method to reproduce only systematically on our machines using that conf above but I can't reproduce it on machines with default configuration.
Environment
$ModLoad imuxsock $ModLoad imjournal
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $RepeatedMsgReduction off $WorkDirectory /var/lib/rsyslog
$Umask 0022 $FileOwner root $FileGroup adm $FileCreateMode 0640 $DirOwner root $DirGroup adm $DirCreateMode 0755
$OmitLocalLogging on $IMJournalStateFile imjournal.state $imjournalRatelimitBurst 30000 $imjournalRatelimitInterval 60
$template RSYSLOG_SyslogProtocol23Format_FQDN,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% mesos-master06-par.central.criteo.preprod %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%"
[Many entries like the following block] $ActionQueueType LinkedList $ActionQueueSize 1000 $ActionQueueDequeueBatchSize 100 $ActionQueueTimeoutEnqueue 0 :programname, isequal, "yum" @@kestrel-infra-par.storage.criteo.preprod:12201;RSYSLOG_SyslogProtocol23Format_FQDN
$SystemLogSocketName /run/systemd/journal/socket $InputUnixListenSocketIgnoreMsgTimestamp off