prometheus-community / rsyslog_exporter

Export rsyslog metrics to prometheus
Apache License 2.0
22 stars 8 forks source link

infrequent terminations on debian bookworm #14

Open TheMeier opened 2 months ago

TheMeier commented 2 months ago

Versions: rsyslog_exporter: 1.1.0 rsyslog: 8.2302.0

On Debian bookworm we see infrequent restarts of rsyslog triggered by the exporter exiting normally without any warning or error. The journal entries when this happens look like this:

Jun 28 09:17:07 HOSTNAME rsyslog_exporter[20943]: 2024/06/28 09:17:07 error handling stats line: unknown pstat type: 0, line was: 2024-06-28T09:17:07.045427+02:00 HOSTNAME rsyslogd-pstats: { "name": "global", "origin": "percentile", "values": { } }
Jun 28 09:17:17 HOSTNAME rsyslog_exporter[20943]: 2024/06/28 09:17:17 error handling stats line: unknown pstat type: 0, line was: 2024-06-28T09:17:17.056224+02:00 HOSTNAME rsyslogd-pstats: { "name": "global", "origin": "percentile", "values": { } }
Jun 28 09:17:17 HOSTNAME systemd-journald[218]: Forwarding to syslog missed 20 messages.
Jun 28 09:17:17 HOSTNAME systemd-journald[218]: Data hash table of /var/log/journal/afa5f8f87af042f7ba27ebcfc18b1627/system.journal has a fill level at 75.0 (174764 of 233016 items, 67108864 file size, 383 bytes per hash table item), suggesting rotation.
Jun 28 09:17:17 HOSTNAME systemd-journald[218]: /var/log/journal/afa5f8f87af042f7ba27ebcfc18b1627/system.journal: Journal header limits reached or header out-of-date, rotating.
Jun 28 09:17:17 HOSTNAME rsyslogd[508]: imjournal: journal files changed, reloading...  [v8.2302.0 try https://www.rsyslog.com/e/0 ]
Jun 28 09:17:17 HOSTNAME rsyslogd[508]: imjournal: journal files changed, reloading...  [v8.2302.0 try https://www.rsyslog.com/e/0 ]
Jun 28 09:17:17 HOSTNAME rsyslogd[508]: imjournal: journal files changed, reloading...  [v8.2302.0 try https://www.rsyslog.com/e/0 ]
Jun 28 09:17:17 HOSTNAME rsyslogd[508]: imjournal: journal files changed, reloading...  [v8.2302.0 try https://www.rsyslog.com/e/0 ]
Jun 28 09:17:17 HOSTNAME rsyslog_exporter[20943]: 2024/06/28 09:17:17 input ended, exiting normally
Jun 28 09:17:17 HOSTNAME systemd[1]: rsyslog.service: Main process exited, code=killed, status=7/BUS
Jun 28 09:17:17 HOSTNAME systemd[1]: rsyslog.service: Failed with result 'signal'.
Jun 28 09:17:17 HOSTNAME systemd[1]: rsyslog.service: Consumed 48.006s CPU time.
Jun 28 09:17:17 HOSTNAME systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 1.
Jun 28 09:17:17 HOSTNAME systemd[1]: Stopped rsyslog.service - System Logging Service.
Jun 28 09:17:17 HOSTNAME systemd[1]: rsyslog.service: Consumed 48.006s CPU time.
Jun 28 09:17:17 HOSTNAME systemd[1]: Starting rsyslog.service - System Logging Service...
Jun 28 09:17:17 HOSTNAME rsyslogd[47080]: [origin software="rsyslogd" swVersion="8.2302.0" x-pid="47080" x-info="https://www.rsyslog.com"] start
Jun 28 09:17:17 HOSTNAME systemd[1]: Started rsyslog.service - System Logging Service.
Jun 28 09:17:17 HOSTNAME rsyslogd[47080]: imjournal: journal files changed, reloading...  [v8.2302.0 try https://www.rsyslog.com/e/0 ]
Jun 28 09:17:18 HOSTNAME rsyslogd[47080]: warning: certificate file is not set [v8.2302.0 try https://www.rsyslog.com/e/2330 ]
Jun 28 09:17:18 HOSTNAME rsyslogd[47080]: warning: key file is not set [v8.2302.0 try https://www.rsyslog.com/e/2331 ]
Jun 28 09:17:27 HOSTNAME rsyslog_exporter[47086]: 2024/06/28 09:17:27 Listening on :9104
Jun 28 09:17:27 HOSTNAME rsyslog_exporter[47086]: 2024/06/28 09:17:27 error handling stats line: unknown pstat type: 0, line was: 2024-06-28T09:17:27.859720+02:00 HOSTNAME rsyslogd-pstats: { "name": "global", "origin": "percentile", "values": { } }
Jun 28 09:17:37 HOSTNAME rsyslog_exporter[47086]: 2024/06/28 09:17:37 error handling stats line: unknown pstat type: 0, line was: 2024-06-28T09:17:37.865341+02:00 HOSTNAME rsyslogd-pstats: { "name": "global", "origin": "percentile", "values": { } }
Jun 28 09:17:47 HOSTNAME rsyslog_exporter[47086]: 2024/06/28 09:17:47 error handling stats line: unknown pstat type: 0, line was: 2024-06-28T09:17:47.866307+02:00 HOSTNAME rsyslogd-pstats: { "name": "global", "origin": "percentile", "values": { } }
Jun 28 09:17:47 HOSTNAME systemd-journald[218]: Forwarding to syslog missed 26 messages.
Jun 28 09:17:57 HOSTNAME rsyslog_exporter[47086]: 2024/06/28 09:17:57 error handling stats line: unknown pstat type: 0, line was: 2024-06-28T09:17:57.877088+02:00 HOSTNAME rsyslogd-pstats: { "name": "global", "origin": "percentile", "values": { } }

The Forwarding to syslog missed xxx messages is already addressed, I think it is unrelated. What is suspicious is that every time theses terminations happen 3-4 times the reload is logged by rsyslog-imjournal

-silent is not set

We run the same setup also on Debian bullseye systems (rsyslog 8.2102.0) and do not see these terminations there.

aleroyer commented 1 month ago

Hello @TheMeier,

I think the exit of the exporter is related to the termination of your rsyslog process rsyslog.service: Main process exited, code=killed, status=7/BUS.

The exporter is started by rsyslog itself via omprog (doc)

If the program terminates, it is re-started. If rsyslog terminates, the program’s stdin will see EOF. The program must then terminate.

The log input ended, exiting normally is produced here: https://github.com/prometheus-community/rsyslog_exporter/blob/1b156170a378081348920102ebc026c45c1462ad/exporter.go#L205-L218

And as said in the bufio's Scanner.Scan():

It returns false when there are no more tokens, either by reaching the end of the input or an error. After Scan returns false, the Scanner.Err method will return any error that occurred during scanning, except that if it was io.EOF, Scanner.Err will return nil

From this analysis, I think rsyslog terminates, send a EOF to the exporter, then exit normally (else you would have the error reading input log before the exit). So the exporter is not involved in the termination of rsyslog here.

I did some research on the Internet, and it seems your issue is the same as here which highlight this issue https://github.com/systemd/systemd/issues/24320 This issue appears in v250, fixed in v255. Bookworm is running v252, and bullseye v247.

TheMeier commented 1 month ago

Oooooh. Thank you so much @aleroyer. The issue links are very welcome. I will try to validate this ASAP. Sadly bookworm-backports only ships systemd v254 currently.