kumina / postfix_exporter

A Prometheus exporter for Postfix.
Apache License 2.0
241 stars 140 forks source link

Improve statistics #36

Open CRCinAU opened 5 years ago

CRCinAU commented 5 years ago

I'm trying out this exporter instead of the SNMP exporter I wrote in perl years ago...

In gathering statistics, there's a number of lines missing that would be useful. These include:

A lot of these seems to be ignored as unsupported lines.

If it helps, these are the regex queries I log via perl to extract my stats to SNMP... I'm a bit unsure how they'd translate though:

        if ( $prog eq 'postfix' ) {
                $line =~ /(\w+)\[\d+\]: (.*)$/;
                my $subprog = $1;
                $line = $2;
                if ( $subprog eq 'smtp' ) {
                        if ( $line =~ /\bstatus=sent\b/ ) {
                                $counts{"sent"}++;
                        }
                        elsif ( $line =~ /\bstatus=bounced\b/ ) {
                                $counts{"bounced"}++;
                        }
                        elsif ( $line =~ /NOQUEUE: reject:/ ) {
                                $counts{"rejected"}++;
                        }
                }
                elsif ( $subprog eq 'postscreen' ) {
                        if ( $line =~ /NOQUEUE: reject:/ ) {
                                $counts{"rejected"}++;
                        }
                }
                elsif ( $subprog eq 'smtpd' ) {
                        if ( $line =~ /NOQUEUE: reject:/ ) {
                                $counts{"rejected"}++;
                        }
                        elsif ( $line =~ /User unknown/ ) {
                                $counts{"bounced"}++;
                        }
                }
                elsif ( $subprog eq 'cleanup' ) {
                        if ( $line =~ /Blocked by SpamAssassin/ ) {
                                $counts{"spam"}++;
                        }
                        elsif ( $line =~ /[0-9A-F]+: (?:reject|discard): / ) {
                                $counts{"rejected"}++;
                        }
                }
                elsif ( $subprog eq 'pipe' ) {
                        if ( $line =~ /relay=dovecot/ ) {
                                $counts{"recv"}++;
                        }
                }
        }
        elsif ( $prog eq 'clamav-milter' ) {
                if ( $line =~ /infected by/ ) {
                        $counts{"virus"}++;
                }
        }
flyhard commented 4 years ago

If you could provide a couple of example log lines, I'll take a look at it.

CRCinAU commented 4 years ago

There's probably a few seeing as we can list them and filter later...

eg:

Feb 22 03:18:19 <hostname> postfix/postscreen[<pid>]: WHITELISTED [<ipv4 or ipv6 address>]:<port>
Feb 22 03:20:57 <hostname> postfix/postscreen[<pid>]: NOQUEUE: reject: RCPT from [<spammers IP>]:<port>: 550 5.7.1 Service unavailable; client [<spammers ip>] blocked using DNSBL Filters; from=<fromaddr>, to=<toaddr>, proto=ESMTP, helo=<smtp.aweia.cn>

The key parts in the above would be the postfix/postscreen and the values NOQUEUE and WHITELISTED. Others that might be useful are PREGREET, PASS NEW and PASS OLD. They're all in the same format.

I might have to get back to you on the spamassassin bits...

flyhard commented 4 years ago

I have created a possible fix for the issue with regards to postscreen NOQUEUE events, but I don't have log lines from Spam-assassin to make the required changes. I'll still have to add some more cases for the other events.

anarcat commented 3 years ago

@flyhard could you submit a PR for this? looks like a useful improvement...

anarcat commented 3 years ago

here's an example of a bounce that Postfix is seeing, failing to delivery to another host:

Jan 25 21:25:22 eugeni/eugeni postfix/smtp[558]: DBF82E05E7: to=<[REDACTED]@gmail.com>, orig_to=<REDACTED@torproject.org>, relay=gmail-smtp-in.l.google.com[2a00:1450:400c:c00::1a]:25, delay=2.1, delays=1.7/0/0.12/0.21, dsn=5.7.1, status=bounced (host gmail-smtp-in.l.google.com[2a00:1450:400c:c00::1a] said: 550-5.7.1 [2a01:4f8:fff0:4f:266:37ff:fe48:41b8      19] Our system has detected 550-5.7.1 that this message is likely suspicious due to the very low reputation 550-5.7.1 of the sending domain. To best protect our users from spam, the 550-5.7.1 message has been blocked. Please visit 550 5.7.1  https://support.google.com/mail/answer/188131 for more information. e13si14120901wrq.457 - gsmtp (in reply to end of DATA command))

In general, there doesn't seem to be data about those lines at all in the Exporter. I would expect some metric like:

# HELP postfix_smtp_message_processed_total Total number of outgoing messages processed
# TYPE postfix_smtp_message_processed_total counter 
postfix_smtp_message_processed_total{status="sent"} 0
postfix_smtp_message_processed_total{status="rejected"} 0
postfix_smtp_message_processed_total{status="bounced"} 0
[...]

Here there are millions of log lines ignored by the exporter, it's kind of a problem for us:

# HELP postfix_unsupported_log_entries_total Log entries that could not be processed.
# TYPE postfix_unsupported_log_entries_total counter
postfix_unsupported_log_entries_total{service=""} 4007
postfix_unsupported_log_entries_total{service="anvil"} 23658
postfix_unsupported_log_entries_total{service="bounce"} 54792
postfix_unsupported_log_entries_total{service="cleanup"} 11656
postfix_unsupported_log_entries_total{service="error"} 287860
postfix_unsupported_log_entries_total{service="local"} 331
postfix_unsupported_log_entries_total{service="master"} 45
postfix_unsupported_log_entries_total{service="pickup"} 2
postfix_unsupported_log_entries_total{service="qmgr"} 5741
postfix_unsupported_log_entries_total{service="scache"} 18178
postfix_unsupported_log_entries_total{service="smtp"} 3.01344e+06
postfix_unsupported_log_entries_total{service="smtpd"} 750665
postfix_unsupported_log_entries_total{service="verify"} 195

Now some of those messages could be "guessed" as to what they are doing. For example the bounce service might be used to infer some bounces.... but it's not as good as the other metric...