timescale / pg_prometheus

PostgreSQL extension for Prometheus data
Apache License 2.0
214 stars 44 forks source link

Unable to configure High Availability of Prometheus with timescaleDB #49

Open MohanSaiTeki opened 4 years ago

MohanSaiTeki commented 4 years ago

I am trying to set up the High Availability of Prometheus using timescaleDB with below configurations.

Node exporter

docker run -d -p 9100:9100 quay.io/prometheus/node-exporter

Prometheus

global: scrape_interval: 5s evaluation_interval: 10s scrape_configs: job_name: prometheus static_configs: targets: ['10.128.15.221:9100'] remote_write: url: "http://10.128.15.221:9201/write" remote_read: url: "http://10.128.15.221:9201/read" read_recent: true

global: scrape_interval: 5s evaluation_interval: 10s scrape_configs: job_name: prometheus static_configs: targets: ['10.128.15.221:9100'] remote_write: url: "http://10.128.15.221:9202/write" remote_read: url: "http://10.128.15.221:9202/read" read_recent: true

Prometheus adapter

pg_prometheus

docker run --name pg_prometheus -e POSTGRES_PASSWORD=secret -it -p 5432:5432 timescale/pg_prometheus:latest-pg11 postgres -csynchronous_commit=off

When I spin up, everything is working fine with the below status.

But when I stop the prometheus-1, prometheus-adapter-2 is not picking the leadership. Please find the below logs for adapters.

prometheus-adapter-1

{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:56.513Z"} {"caller":"log.go:27","count":93,"duration":0.005575618,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:29:59.668Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:59.668Z"} {"caller":"log.go:35","level":"warn","msg":"Prometheus timeout exceeded","timeout":"7s","ts":"2020-03-09T10:30:06.960Z"} {"caller":"log.go:35","level":"warn","msg":"Scheduled election is paused. Instance is removed from election pool.","ts":"2020-03-09T10:30:06.960Z"} {"caller":"log.go:31","level":"info","msg":"Instance is no longer a leader","ts":"2020-03-09T10:30:06.962Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:10.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:15.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:20.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:25.958Z"}

prometheus-adapter-2

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.046Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:30:55.047Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.048Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.041Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.043Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.045Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.046Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.041Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.044Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.046Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.046Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.048Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.042Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.043Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.045Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.045Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.046Z"}

But when I stop the prometheus-adapter-1 then prometheus-adapter-2 is picking the leadership.

Another interesting thing is when I again start the promethus-1 then I see "Election id 2: Instance is not a leader. Can't write data" in prometheus-adapter-1 log. Please see the below log.

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.566Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":93,"ts":"2020-03-09T10:33:34.571Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.576Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.576Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.578Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.579Z"} {"caller":"log.go:31","level":"info","msg":"Prometheus seems alive. Resuming scheduled election.","ts":"2020-03-09T10:33:34.959Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.550Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.551Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.553Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.553Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.555Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.556Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.558Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.551Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.551Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.554Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.554Z"}

So, am I followed any wrong step while setting this. or is this bug?

Please help me to resolve this issue.

msarm commented 3 years ago

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

MohanSaiTeki commented 3 years ago

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

This project is SUNSET. Please refer README.md file

msarm commented 3 years ago

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

This project is SUNSET. Please refer README.md file

Ohh yeah, I see it. thank you!

Harkishen-Singh commented 3 years ago

https://github.com/timescale/promscale is the project that is recommended to use.