micrometer-metrics / prometheus-rsocket-proxy

An RSocket proxy to pull metrics from applications that can only open egress
Apache License 2.0
72 stars 28 forks source link

Spring pushAndCloseBlockingly metrics not appearing #84

Closed matas-pov closed 4 months ago

matas-pov commented 4 months ago

Hello, It seems the metrics don't seem to be appearing on the Prometheus proxy server, when inspecting /metrics/connected endpoint after the short-lived application has finished running. The spring library is setup with an auto-configurer to setup the connection and invoke pushAndCloseBlockingly method (as inspected). The application runs for around 1 second, as it's a simple "Hello World".

However, if the Prometheus invokes the /metrics/connected endpoint whilst the application is still running, then metrics are scraped successfully. This can be replicated by adding a sleep exceeding the Prometheus scrape interval (for example sleep for 30 seconds).

Based on the logs, I can assess that the application is ran for less than 1 second, see below (and no metrics appear):

14:16:28.880 [reactor-tcp-kqueue-2] INFO  io.micrometer.prometheus.rsocket.PrometheusRSocketClient - Connected to RSocket Proxy!
Hello world!
14:16:29.900 [reactor-tcp-kqueue-2] INFO  io.micrometer.prometheus.rsocket.PrometheusRSocketClient - Pushing data to RSocket Proxy before closing the connection was successful!

However, if I prolong the application to 30 seconds, then a Prometheus attempts to scrape the metrics endpoint and invokes the rsocket to reach out to the application. We are informed of this connection by the following message (and metrics then appear on the proxy):

INFO  io.micrometer.prometheus.rsocket.PrometheusRSocketClient - Connected to RSocket Proxy!

Library: prometheus-rsocket-spring Version: 1.5.3

See below a snippet of the code for the failing scenario:


@Component
public class TaskRunner
    implements CommandLineRunner {

    Counter customTestCounter;

    Counter customAltTestCounter;

    public TaskRunner(
        final MeterRegistry meterRegistry) {

        customTestCounter = Counter.builder("custom_counter_test").tag("result", "success").register(meterRegistry);
        customAltTestCounter =
            Counter.builder("custom_alt_counter_test").tag("result", "success").register(meterRegistry);
    }

    public void run(
        final String... strings) {

        customTestCounter.increment(10);
        customAltTestCounter.increment();

        System.out.println("Hello world!");

        customAltTestCounter.increment();
    }
}
shakuzen commented 4 months ago

I tried to reproduce this and I had difficulty. I made an application that increments a counter in a CommandLineRunner and shut it down after that runner finished. Then I called /metrics/connected on the proxy and I was able to see the metrics from the application that had already shut down including the custom counter with the increment.

I'm not sure what the difference might be between your application and mine. Could you please put together a minimal sample project and instructions for reproducing the issue?

matas-pov commented 4 months ago

@shakuzen It appears that I'm unable to replicate this when using a local docker image with the proxy. However, this appears to be happening when deployed inside a Kubernetes cluster. I will have to do some further investigation, however I'm using the bitnami chart here: https://github.com/bitnami/charts/tree/main/bitnami/spring-cloud-dataflow/

matas-pov commented 4 months ago

Closing this issue. Perhaps there's a misunderstanding here - I wasn't aware of the cleanup policy of the /metrics/connected - any short lived application scraped metrics will be removed from the /metrics/connected endpoint, after a scrape invocation and assuming the short lived application is no longer alive.

shakuzen commented 3 months ago

@matas-pov indeed that's how it currently works. I believe the intention is for this to be documented in this section of the README (bold emphasis added):

Use pushAndClose() on the PrometheusRSocketClient in a shutdown hook for short-lived and serverless applications. This performs a fire-and-forget push of metrics to the proxy, which will hold them until the next scrape by Prometheus.

That may not be ideal behavior in some configurations where multiple instances will scrape metrics because only the first will get these metrics. It could be reasonable to consider an enhancement that has different behavior but I'm not sure exactly what would be best.