lesovsky / pgscv

pgSCV is a multi-purpose monitoring agent and metrics exporter
BSD 3-Clause "New" or "Revised" License
165 stars 28 forks source link

Troubles with patroni metrics #10

Open dan-aksenov opened 3 years ago

dan-aksenov commented 3 years ago

Describe the bug Having troubles with reading patroni metrics:

{"level":"error","service":"pgscv","time":"2021-09-23T16:41:30+03:00","message":"patroni/common collector failed; Get \"http://[<nil>]:8008/liveness\": dial tcp: lookup <nil>: n$
 such host"}

Steps to reproduce sudo -u postgres ./pgscv --config-file pgscv.yaml Also tried running file with root. Also tried running with PATRONI_URL="https://hostname:8008" exported

Expected behavior No errors and patroni metrics shown in grafana.

pgSCV startup options

listen_address: 0.0.0.0:9890
defaults:
    postgres_username: "postgres"
    postgres_password: "mypassword"

Errors and Logs

sudo ./pgscv --config-file pgscv.yaml
{"level":"info","service":"pgscv","time":"2021-09-23T16:55:03+03:00","message":"read configuration from pgscv.yaml"}
{"level":"info","service":"pgscv","time":"2021-09-23T16:55:03+03:00","message":"no-track disabled, for details check the documentation about 'no_track_mode' option."}
{"level":"info","service":"pgscv","time":"2021-09-23T16:55:03+03:00","message":"listen on http://0.0.0.0:9890"}
{"level":"info","service":"pgscv","time":"2021-09-23T16:55:03+03:00","message":"auto-discovery: service added [system:0]"}
{"level":"info","service":"pgscv","time":"2021-09-23T16:55:03+03:00","message":"auto-discovery [python3]: service added [patroni:8008]"}
{"level":"info","service":"pgscv","time":"2021-09-23T16:55:03+03:00","message":"auto-discovery [postgres]: service added [postgres:5432]"}
{"level":"warn","service":"pgscv","time":"2021-09-23T16:55:03+03:00","message":"service [patroni:8008] failed: tries remain 1/10"}
{"level":"error","service":"pgscv","time":"2021-09-23T16:55:30+03:00","message":"patroni/common collector failed; Get \"http://[<nil>]:8008/liveness\": dial tcp: lookup <nil>: no such host"}
{"level":"error","service":"pgscv","time":"2021-09-23T16:55:30+03:00","message":"patroni/common collector failed; Get \"http://[<nil>]:8008/liveness\": dial tcp: lookup <nil>: no such host"}
{"level":"warn","service":"pgscv","time":"2021-09-23T16:55:30+03:00","message":"get model for vda failed: open /sys/block/vda/device/model: no such file or directory; skip"}
{"level":"warn","service":"pgscv","time":"2021-09-23T16:55:30+03:00","message":"get model for vdb failed: open /sys/block/vdb/device/model: no such file or directory; skip"}

In Debug mode:

{"level":"debug","service":"pgscv","time":"2021-09-23T17:15:31+03:00","message":"auto-discovery: looking up for new services..."}
{"level":"debug","service":"pgscv","time":"2021-09-23T17:15:31+03:00","message":"auto-discovery [patroni]: analyzing process with pid 24042"}
{"level":"debug","service":"pgscv","time":"2021-09-23T17:15:31+03:00","message":"auto-discovery: patroni service has been found, pid 24042, available through [<nil>]:8008"}

Environment (please complete the following information):

Additional context None

lesovsky commented 3 years ago

provide please patroni.yml and output of sudo ss -luntp |grep 8008

dan-aksenov commented 3 years ago

patroni.yml

restapi:
  listen: my.host.fqdn:8008
  connect_address: my.host.fqdn:8008

I hope it's sufficient, since you require only listen and certfile string from config file.

sudo ss -luntp |grep 8008
tcp   LISTEN   0        5                ip_address:8008          0.0.0.0:*      users:(("python3",pid=24042,fd=8))

also this:

curl -s  http://$(hostname):8008

works fine

dan-aksenov commented 3 years ago

Changed listen from fqdn to ip. Seems working at least getting:

curl -s http://ip:9890/metrics | grep patroni
# HELP patroni_up State of Patroni service: 1 is up, 0 otherwise.
# TYPE patroni_up gauge
patroni_up{service_id="patroni:8008"} 1
pgscv_services_registered_total{service="patroni",service_id="patroni:8008"} 1

But why can't I use fqdn? In out IAC rules fqdns are preffered over IPs.

lesovsky commented 3 years ago

There is no reason not to use FQDN's. I think this is just a bug/mistake, I will fix it.

dan-aksenov commented 3 years ago

got some more errors for patroni monitoring:

{"level":"error","service":"pgscv","time":"2021-09-24T11:07:00+03:00","message":"patroni/common collector failed; parse patroni postmaster_start_time string '2021-09-24 06:22:35.102 MSK' failed: parsing time \"2021-09-24 06:22:35.102 MSK\" as \"2006-01-02 15:04:05.999999Z07:00\": cannot parse \" MSK\" as \"Z07:00\""}

Not sure id it deserves separate issue.