Closed henri9813 closed 3 months ago
Hi Henri,
I tested two scenarios:
error while getting cluster stats: Get \"http://localhost:2113/gossip\": dial tcp [::1]:2113: connect: connection refused
error while getting subscription stats: Get \"http://localhost:12345/subscriptions\": context deadline exceeded
In both cases the exporter correctly reported eventstore_up 0
.
I think the problem you observed may be related to the timeout configuration. Both Prometheus and the exporter have 10s default scrape timeout. You should adjust these timeouts so that Prometheus one is longer and the information about timed out ES connection can propagate to Prometheus.
You can also consider alerting on the absence of the metric e.g. absent(eventstore_up) OR eventstore_up == 0
or on the status of the Prometheus job.
Hello,
You right about the prometheus timeout, I forgot this.
However, having the exporter's timeout equal to prometheus one may not be a good idea ? maybe can we consider to reduce it a 5s ( which is already a huge timeout ? ) to avoid prometheus timeout configuration ?
I'm obsessed with having plug&play / zeroconf exporter :)
Thanks for the absent
keyword, I didn't know it !
Hi,
Sure, I will consider changing the default timeout on the exporter. It is a bit unfortunate that it matches the Prometheus default timeout. Note, you can configure the exporter's timeout using the TIMEOUT
env var.
Thank ! can I do a PR for this ?
Hello,
The exporter doesn't detect all eventstore down:
In the following case I have in my prometheus:
This is a problem because no alert is fire ( I configure an alert if up == 0 )
Best regards