prometheus / jmx_exporter

A process for exposing JMX Beans via HTTP for Prometheus consumption
Apache License 2.0
3.06k stars 1.2k forks source link

Support multi-target in http server mode #974

Closed KumKeeHyun closed 5 months ago

KumKeeHyun commented 5 months ago

Currently, the HTTP server only allows collecting metrics from a single target specified by the hostPort or jmxUrl configuration options. However, with Prometheus supporting multi-target and the jmx_exporter reaching version 1.0, the JmxCollector implementing the MultiCollector interface seems capable of supporting multiple targets.

https://github.com/prometheus/jmx_exporter/blob/77bd2a40914e7c0a936c822852d3926337427462/collector/src/main/java/io/prometheus/jmx/JmxCollector.java#L52

I propose introducing a new endpoint /metrics?target={hostPort} that allows dynamically specifying the jmtUrl for the JmxScraper to collect.

https://github.com/prometheus/jmx_exporter/blob/77bd2a40914e7c0a936c822852d3926337427462/collector/src/main/java/io/prometheus/jmx/JmxCollector.java#L719-L729

If a request arrives at /metrics without a target parameter, the collector would collect metrics from the configured hostPort or jmxUrl or empty string(only agent mode). This approach would not break existing behavior in HTTP server mode, as specifying hostPort or jmxUrl remains mandatory. In agent mode, an empty string would be passed, aligning with the original design.

A limitation is the inability to set authentication and SSL for each target individually. However, similar to the redis exporter, this could be restricted to using the same authentication for all targets.

While I understand the JMX exporter was initially designed with agent mode (local JVM) in mind, I believe leveraging the HTTP server's advantages would be beneficial. Although remotely collecting JMX metrics might incur network overhead, careful configuration of includeObjectName/excludeObjectName can mitigate this concern.

I have already tested this functionality by modifying the source code and achieved satisfactory results. If you are open to supporting this feature, I would be happy to submit a pull request for review.

Thanks for your time.

dhoard commented 5 months ago

@KumKeeHyun The standalone exporter has configuration values (rules, hostPort/jmxUrl, potentially JMX authentication/SSL, etc.) that are specific to the target JVM/MBeans that are being scraped.

How are you handling configuration in the functionality that you have tested?

KumKeeHyun commented 5 months ago

@dhoard

Apologies for the lack of detailed explanations regarding the settings.

The multi-target feature was suggested under the assumption that several servers using the same Rules, Authentication, and SSL are specified as targets.

The tests were conducted as follows:

hostPort for standalone exporter's config was randomly chosen from one of the brokers.

dhoard commented 5 months ago

@KumKeeHyun I feel this usage scenario has already been solved by using a reverse proxy (i.e. Nginx or other) as a router to the correct Kafka server/exporter agent.

KumKeeHyun commented 5 months ago

@dhoard

I understood the use case of using the reverse proxy to be as follows.

flowchart LR
    prometheus -- "/metrics?target=node-01:9999" --> nginx
    nginx -- "/metrics" --> jmx-exporter-01
    nginx --> jmx-exporter-02
    nginx --> jmx-exporter-03
    subgraph kafka-cluster
    subgraph node-01
    jmx-exporter-01 --> kafka-01
    end
    subgraph node-02
    jmx-exporter-02 --> kafka-02
    end
    subgraph node-03
    jmx-exporter-03 --> kafka-03
    end
    end

There is no functional difference between this use case and multi-target. However, I think multi-target has a significant operational benefits. If jmx-exporter supports multi-target, the overall configuration will be as follows.

flowchart LR
    prometheus -- "/metrics?target=node-01:9999" --> jmx-exporter
    jmx-exporter -- "JmxScrape" --> kafka-01
    jmx-exporter --> kafka-02
    jmx-exporter --> kafka-03
    subgraph kafka-cluster
    subgraph node-01
    kafka-01
    end
    subgraph node-02
    kafka-02
    end
    subgraph node-03
    kafka-03
    end
    end

This approach allows for independent deployment of jmx-exporter instances. This means that when there is a deployment task such as upgrading the jmx-exporter version or changing the rules configuration, we don't need to work on all nodes where jmx-exporter is installed, but only on the independently configured jmx-exporter.

dhoard commented 5 months ago

@KumKeeHyun I agree it changes the update domain (standalone exporter vs Java agent.)


Standalone exporter concerns: More infrastructure but no application restarts

To deploy the standalone exporter properly you need three instances plus a high availability load balancer for high availability/fault tolerance.

Why three standalone exporter instances...

Example:

Standalone exporter instances (1, 2, 3)

You upgrade instance 1

While instance 1 is being upgraded, instance 2 crashes for some reason (bug, infrastructure, etc.)

This leaves you with instance 3 to service requests.

If you only have 2 exporter instances, you have an outage.

If you have a single instance, you have an outage during the upgrade.

Because you need multiple instances of the standalone exporter for high availability/fault tolerance, you need a high availability load balancer.

Some people try to use DNS in place of a load balancer, but DNS caching can cause failures.

Based on my experience working with enterprises, DNS changes typically require a change ticket that is implemented by another team.


Java agent exporter: Less infrastructure but application restarts

If you are using the Java agent exporter, an update of the jar and application restart would be required.

Not ideal from a Kafka perspective since it will cause producer/consumer errors as well leader election. Properly implemented Kafka applications should implement retries which will mitigate the restart impact.

YAML configuration updates can be handled via automation so no availability impact.


Security concerns:

Allowing Prometheus (or other collecting application) to provide a dynamic value to identify the scrape target is typically considered insecure.

The implementation would need have extra configuration to map an id to jmxUrl/hostPort.


I'm not opposed to the functionality, but we have to make sure it doesn't break existing users (risk management.)

KumKeeHyun commented 5 months ago

Thank you for explaining in detail. I have understood the concerns for each domain. I will close this issue now. Thanks :)