Context deadline exceeded not store any values

henri9813 commented 4 months ago

Hello,

The exporter doesn't detect all eventstore down:

time="2024-05-15T13:39:40Z" level=info msg="Running scrape"
time="2024-05-15T13:39:50Z" level=error msg="Error while getting data from EventStore" error="Get \"http://eventstore.svc.cluster.local:2113/subscriptions\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
time="2024-05-15T13:39:55Z" level=info msg="Running scrape"

In the following case I have in my prometheus:

This is a problem because no alert is fire ( I configure an alert if up == 0 )

Best regards

marcinbudny commented 4 months ago

Hi Henri,

I tested two scenarios:

Server was not available which results in error while getting cluster stats: Get \"http://localhost:2113/gossip\": dial tcp [::1]:2113: connect: connection refused
Server accepts a connection from the exporter but does not send any data, which results in timeout: error while getting subscription stats: Get \"http://localhost:12345/subscriptions\": context deadline exceeded

In both cases the exporter correctly reported eventstore_up 0.

I think the problem you observed may be related to the timeout configuration. Both Prometheus and the exporter have 10s default scrape timeout. You should adjust these timeouts so that Prometheus one is longer and the information about timed out ES connection can propagate to Prometheus.

You can also consider alerting on the absence of the metric e.g. absent(eventstore_up) OR eventstore_up == 0 or on the status of the Prometheus job.

henri9813 commented 3 months ago

Hello,

You right about the prometheus timeout, I forgot this.

However, having the exporter's timeout equal to prometheus one may not be a good idea ? maybe can we consider to reduce it a 5s ( which is already a huge timeout ? ) to avoid prometheus timeout configuration ?

I'm obsessed with having plug&play / zeroconf exporter :)

Thanks for the absent keyword, I didn't know it !

marcinbudny commented 3 months ago

Hi,

Sure, I will consider changing the default timeout on the exporter. It is a bit unfortunate that it matches the Prometheus default timeout. Note, you can configure the exporter's timeout using the TIMEOUT env var.

henri9813 commented 3 months ago

Thank ! can I do a PR for this ?

marcinbudny / eventstore_exporter

Context deadline exceeded not store any values #40