sergeymakinen / postfix_exporter

Export Postfix stats to Prometheus
BSD 3-Clause "New" or "Revised" License
9 stars 0 forks source link

metrics not consistent #7

Open deajan opened 7 months ago

deajan commented 7 months ago

Using postfix_exporter, there are lot of missing values, which show NaN

postfix_smtp_delay_seconds{status="bounced",quantile="0.5"} NaN
postfix_smtp_delay_seconds{status="bounced",quantile="0.9"} NaN
postfix_smtp_delay_seconds{status="bounced",quantile="0.99"} NaN
postfix_smtp_delay_seconds{status="deferred",quantile="0.5"} NaN
postfix_smtp_delay_seconds{status="deferred",quantile="0.9"} NaN
postfix_smtp_delay_seconds{status="deferred",quantile="0.99"} NaN
postfix_smtp_delay_seconds{status="deliverable",quantile="0.5"} NaN
postfix_smtp_delay_seconds{status="deliverable",quantile="0.9"} NaN
postfix_smtp_delay_seconds{status="deliverable",quantile="0.99"} NaN
postfix_smtp_delay_seconds{status="undeliverable",quantile="0.5"} NaN
postfix_smtp_delay_seconds{status="undeliverable",quantile="0.9"} NaN
postfix_smtp_delay_seconds{status="undeliverable",quantile="0.99"} NaN

Also, I do not have at all any postfix_lmtp* metrics.

Env:

postfix_exporter, version 1.2.3 (branch: HEAD, revision: 1a165a25141dd26027a1a3fec8980541b6f8b962)
  build user:       root@c115ac5c6c2d
  build date:       20230831-21:59:00
  go version:       go1.20.7
  platform:         linux/amd64
  tags:             netgo

OS:

CentOS 7
postconf mail_version
mail_version = 3.5.9

I can provide anonymized mail logs if needed.

sergeymakinen commented 7 months ago

Hey @deajan, I think yes, logs would be helpful to test the exporter against and see what's going wrong.

Also, I do not have at all any postfix_lmtp* metrics.

In my dashboard I have the following query for incoming messages and it seems to work good: increase(postfix_lmtp_statuses_total{host="***", status="sent"}) or on() vector(0). So it's pretty strange.

deajan commented 7 months ago

I understand that you're getting the metrics on your setup, but mine don't even show up in the /metrics path.

I've uploaded two postfix logs in issue #6, maybe this will help to diagnose what part of the logs aren't parsed.

Thank you.

sergeymakinen commented 7 months ago

@deajan Can you please check whether you've got postfix_smtp_delay_seconds_sum and postfix_smtp_delay_seconds_count values for summary labels with NaNs? It seems like summary values (which postfix_smtp_delay_seconds are) are only valid for 10 minutes by default and it's pretty standard, so if you have *_sum/*_count values at least, it's okay.

Also, I've found LMTP log records in one of the log files (looking for lmtp.+status=) and their metrics appeared correctly when tested against that file.

Unrelated: from your logs I've discovered Recipient address rejected: Rejected by SPF NOQUEUE reject message should be trimmed down to avoid raw addresses in labels, I'll work it out.