open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.11k stars 2.39k forks source link

Upcoming changes to JMX metrics gatherer #34825

Open breedx-splk opened 3 months ago

breedx-splk commented 3 months ago

Component(s)

receiver/jmx

Describe the issue you're reporting

Hi collector folks! Greetings from the Java instrumentation group.

We have a java contrib component commonly called the JMX Metrics Gatherer which allows users to choose from a set of preconfigured out-of-the-box frameworks and/or provide custom configurations for their own JMX MBeans. This component runs as a stand-alone jar file, connects to a JVM, and exports metrics. We believe it is bundled or otherwise launched from the jmxreceiver in this repository.

For some time, we (otel java sig) have had a confusing and limiting situation which involves two components with similar goals but notably different implementations and configuration: jmx auto-instrumentation via the agent, and the above-mentioned jmx-metrics gatherer.

We want to begin unifying these separate implementations into something more cohesive. Furthermore, the jmx-metrics gatherer is written in and configured with groovy, and there is little interest in maintaining that going forward. So, the approach we discussed today in the Java SIG meeting today is this -- we will create a new module in java-contrib (name is tbd) that will essentially be a new version of the jmx-metrics gatherer. This will be written in Java and will use the same yaml based configuration as the java agent instrumentation. Once that module is published and we think it is ready to be used widely, we will remove the existing jmx-metrics module and stop publishing updates to it.

So this issue is really to:

  1. let collector contrib maintainers/contributors know that this change is coming (timeline tbd)
  2. solicit feedback and provoke discussion about the approach described above
  3. learn more from collector folks about how widely used the jmx receiver is and any other existing challenges with it.

Thanks!

github-actions[bot] commented 3 months ago

Pinging code owners:

SylvainJuge commented 3 months ago

For example for the items 2) and 3), we already know the following:

The current JMX gatherer (contrib) implementation also includes some pre-defined sets of metrics which are selected through the otel.jmx.target.system (doc)

One of the goals is being able to re-implement the current set of metrics with the YAML configuration instead of the groovy definitions, so ideally it could be a drop-in replacement for people relying on it.

In practice though, end-users are also allowed to provide their own groovy metrics definitions with otel.jmx.groovy.script config option, for this use-case a migration from groovy to yaml will be necessary. This "breaking change" is the main reason that makes us believe creating a new component is better than changing the current implementation.

In other terms, for those pre-defined metrics, do we have any idea if using the otel.jmx.groovy.script is frequent or not ?

BinaryFissionGames commented 2 months ago

From a cursory glance at the code, I don't think otel.jmx.groovy.script is configurable through the receiver. The jmx gatherer config is completely generated by the receiver in this function, and I don't see any reference to it.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/2d5b62d99b9940c2dde01e0c50016be46177e970/receiver/jmxreceiver/receiver.go#L171

So if otel.jmx.target.system is still supported in some form, it seems like it might not necessarily be a breaking change?

hughesjj commented 2 months ago

From a cursory glance at the code, I don't think otel.jmx.groovy.script is configurable through the receiver.

It's not ... it was actually specifically taken out due to security concerns (it basically allows the collector to run an arbitrary groovy script -> essentially arbitrary code execution)

This proposed code change would allow customers of the collector to change what's queried, an in general be a huge usability increase

breedx-splk commented 2 months ago

So if otel.jmx.target.system is still supported in some form, it seems like it might not necessarily be a breaking change?

Not directly related to the collector or the receiver, but we have to still assume that existing users are launching the jar file through some other mechanism, in which case it's still "breaking" to them. Just sayin'.

choucavalier commented 1 month ago

Why implement the JMX scraper in Java, thereby requiring to bundle java within the otelcol image? Why not implement the scraper in Go directly (using something like gojmx), removing any Java dependency? We could remove one layer of complexity here.

SylvainJuge commented 1 month ago

Why implement the JMX scraper in Java, thereby requiring to bundle java within the otelcol image? Why not implement the scraper in Go directly (using something like gojmx), removing any Java dependency? We could remove one layer of complexity here.

On a high level, the JMX Scraper has the following goals:

Using something from gojmx can simplify the otel collector extension, but it's very likely that not all features of the JMX protocol are supported (clients & server certificates, authentication, using alternative JMXMP protocol instead of RMI, ...). From experience dealing with those in Java is already tricky, so doing that with a completely independent protocol implementation is expected to be an order of magnitude harder.

I would be happy to be proven wrong on this last aspect, and maybe it could also be interesting to create another otel collector extension that relies on gojmx, even if not all the advanced features/complexities of the JMX protocol are not supported it could provide something useful and remove the dependency on a JVM that might work for some users with simple setups. As usual, feel free to contribute another implementation, especially if you are already familiar with the otel collector and gojmx.

choucavalier commented 1 month ago

Thanks @SylvainJuge for your very detailed answer to my naive question. I'm not familiar with the whole complexity you just described. I guess it makes sense then.