jenkinsci / opentelemetry-plugin

Monitor and observe Jenkins with OpenTelemetry.
https://plugins.jenkins.io/opentelemetry/
Apache License 2.0
100 stars 53 forks source link

Support for Retrieving Jenkins Build Duration Metrics via OpenTelemetry Plugin #972

Closed miraccan00 closed 2 weeks ago

miraccan00 commented 4 weeks ago

Hello OpenTelemetry Development Team,

I have been working on integrating Jenkins with OpenTelemetry to collect build metrics. While I've made some progress, I've encountered challenges in retrieving certain metrics and would like to seek your assistance or guidance.

Objectives:

Challenges:

Additional Goals:

Current Implementation:

I have created a repository with my current setup and attempts to achieve these objectives:

Request:

Thank you for your time and consideration. I look forward to your response and the possibility of enhancing the Jenkins OpenTelemetry integration together.

Best regards,

cyrille-leclerc commented 4 weeks ago

Great suggestion! For the build metrics, please see:

Longer term you are absolutely right, the Jenkins otel plugin should provide all the metrics needed by Jenkins admins and users.

I'm on PTO at the moment, I'll follow up asap.

christophe-kamphaus-jemmic commented 3 weeks ago

The opentelemetry-plugin also supports sending build traces to a tracing backend (elasticsearch/jaeger). These traces can be queried to calculate metrics which can be displayed on a dashboard. These are also called span metrics. This is already possible in the current version of the plugin by using the span duration grouped by ci.pipeline.id attribute set on the root span of the build.

If you also want duration metrics for individual stages per-pipeline that is possible by adding withSpanAttributes to your jobs. cf. https://github.com/jenkinsci/opentelemetry-plugin/issues/952#issuecomment-2388922816, https://github.com/jenkinsci/opentelemetry-plugin/issues/811#issuecomment-2116113648

In general it's not a good idea to have very specific metrics (ie. specific to a single job run) because of the cardinality issue some metric backends suffer from (eg. Prometheus). Usually metrics are used to aggregate data (counts, histograms, …) while traces/logs consider individual requests/events. For traces/logs it's possible to use sampling to reduce the amount of data needing to be processed and stored. If a sampling rate of 100% is used than any metric calculate based on the traces should be accurate.

I think https://github.com/jenkinsci/opentelemetry-plugin/pull/959 is a great addition to the opentelemetry-plugin. It is fine since it aggregates the individual job runs for a given pipeline and gives administrators control over which pipelines should be monitored specifically. What it does not allow is querying the exact build duration for a specific job run. Having metrics specific to a job run would be problematic. The prometheus-plugin has such an option which is thankfully guarded by a configuration option, but it is global and does not allow filtering which jobs it applies to: Image

In my experience if you want per-run metrics you are better of to query the traces.

cyrille-leclerc commented 2 weeks ago

Please use the ci.pipeline.run.duration{ci.pipeline.id="<<pipeline full name>>", ci.pipeline.result="<<SUCCESS, UNSTABLE, FAILURE, NOT_BUILT, ABORTED>>"} histogram metric we have just released. ℹ Use the otel.instrumentation.jenkins.run.metric.duration.allow_list and otel.instrumentation.jenkins.run.metric.duration.deny_list to specify the pipelines for which you want to capture the run duration, other pipelines will be aggregated in the ci.pipeline.id="#other#" time series.

See documentation https://github.com/jenkinsci/opentelemetry-plugin/blob/main/docs/monitoring-metrics.md#build-duration

I'm marking your enhancement request as solved. Please open new enhancement requests if needed.

miraccan00 commented 2 weeks ago

Thanks for addressing my enhancement request and providing the solution. I appreciate the prompt response and detailed guidance.

cyrille-leclerc commented 2 weeks ago

You're welcome!