kumina / postfix_exporter

A Prometheus exporter for Postfix.
Apache License 2.0
240 stars 141 forks source link

journald metrics not available #55

Open breed808 opened 4 years ago

breed808 commented 4 years ago

When running postfix_exporter built from master, journald metrics are not available, and there is only a single path for the postfix_up metric present when querying the exporter. The exporter is printing "Reading log events from systemd" on startup.

Bisecting with git reveals that commit 26d06428312ac8cbf2dfb9d917f85ec0057035f1 introduced the issue.

breed808 commented 4 years ago

Additionally, I've run the exporter with a debugger and the CollectFromLogLine function is not reached.

ktosiek commented 3 years ago

I believe postfix_exporter currently only reads the log once, and stops on first 0-length read.

dswarbrick commented 2 years ago

I co-maintain the Debian package prometheus-postfix-exporter, and have also just recently discovered that systemd journal support inexplicably broke somewhere between v0.2.0 and v0.3.0. This is somewhat disappointing, since v0.2.0 was working quite reliably.

From the minimal debugging that I've done so far, it appears to bail out of the SystemdLogSource.Read() function with io.EOF when s.journal.Next() returns zero, and never actually calls s.journal.GetEntry()

func (s *SystemdLogSource) Read(ctx context.Context) (string, error) {
    c, err := s.journal.Next()
    if err != nil {
        return "", err
    }
    if c == 0 {
        return "", io.EOF
    }

    e, err := s.journal.GetEntry()
...

That subsequently causes the for-loop in PostfixExporter.StartMetricCollection() to bail out, and that's pretty much game over.

By commenting out the "Start at end of journal" seek in logsource_systemd.go, I can get the exporter to "replay" historical systemd journal entries, and it appears to produce the expected metrics. However, when it reaches the end of the events, the Read() function still bails out with io.EOF. This seems to be the main issue - it doesn't wait for further events, and if it is allowed to seek to the end of the journal (i.e., unmodified code from the 0.3.0 tag), it will immediately bail out as there are no events to read.

fpletz commented 2 years ago

This might also be an issue with recent versions of go-systemd: https://github.com/coreos/go-systemd/issues/392

Ma27 commented 2 years ago

I did some digging today because this bothered me a bit: