prometheus / client_java

Prometheus instrumentation library for JVM applications
http://prometheus.github.io/client_java/
Apache License 2.0
2.17k stars 796 forks source link

Prometheus Pushgateway Retry and Failure Handle #635

Open gmiano opened 3 years ago

gmiano commented 3 years ago

Hello, I am developing a SpringBoot application that is executed on a Kubernetes cluster as a CronJob on a daily basis. The system has its own monitoring system based on Prometheus and Grafana and in order to simplify Operations I am to expose my job's metrics through Prometheus.

It is a classic use case that has to use a Pushgateway, because the application context is ephemeral and scraping is not the proper solution.

I correctly setup everything to be able to make simpleclient_pushgateway to work properly. Further I am using ConfigurableApplicationContext.close() to gracefully shutdown and ensure data are sent before JVM exits.

My only concern is about connectivity problems that would cause my metrics not to be pushed. There is any way to set a retry logic on the push metrics request, in order to be able to handle those scenarios correctly?

As additional information I am using:

Regards, Giuseppe

fstab commented 3 years ago

I guess you don't call PushGateway.pushAdd() directly, but instead set

management.metrics.export.prometheus.pushgateway.enabled=true

and have Spring's PrometheusPushGatewayManager call PushGateway.pushAdd() for you.

So your question would be more for the Spring boot actuator people and not for us, because it's not our code. However, looking at their code I'm pretty sure it's not configurable:

https://github.com/spring-projects/spring-boot/blob/24925c3daef169deaa9b2107ef02be4a67edb886/spring-boot-project/spring-boot-actuator/src/main/java/org/springframework/boot/actuate/metrics/export/prometheus/PrometheusPushGatewayManager.java#L106-L121

Anyway, if you really want retries you could copy the relevant code calling PushGateway.pushAdd() out of the PrometheusPushGatewayManager and call it directly. Then you can surrounded it with try-catch and implement whatever retry mechanism works for you.