steffan-westcott / clj-otel

An idiomatic Clojure API for adding telemetry to your libraries and applications using OpenTelemetry.
https://cljdoc.org/d/com.github.steffan-westcott/clj-otel-api
Apache License 2.0
183 stars 12 forks source link

Mismatch in metric names in metadata and actual values #10

Open rhishikeshj opened 9 months ago

rhishikeshj commented 9 months ago

In case of a metric defined like this

(instrument/instrument {:name "ws-disconnect"
                                         :unit "times"
                                         :instrument-type :counter
                                         :description "The number of times the ws disconnects"})

We get the metrics entries to be

# TYPE ws_disconnect_total counter
# HELP ws_disconnect_total The number of times the ws disconnects
ws_disconnect_times_total{otel_scope_name="steffan-westcott.clj-otel",otel_scope_version="0.2.3",client_id="<redacted>"} 1.0 1696251116839

While I am aware that the addition of the _times and _total suffixes are in line with Prometheus metrics naming best practices, tools like AWS Cloudwatch trip over such metrics since they cant find any metadata for ws_disconnect_times_total since the metadata is actually for ws_disconnect_total

Is there any configurable way to also suffix the _times etc to the TYPE declaration of the metric? So that the metric names are always the same in the metadata and actual values?

steffan-westcott commented 9 months ago

The rules for mapping Prometheus metric metadata to the OpenTelemetry metric model include guidance for unit metadata. These rules state that portions in {} (including the braces) are omitted.

I suggest trying this:

(instrument/instrument {:name "ws-disconnect"
                        :unit "{times}"
                        :instrument-type :counter
                        :description "The number of times the ws disconnects"})
rhishikeshj commented 9 months ago

I guess you're right about rules for mapping metric metadata. And for my own custom metrics, i am even okay dropping the :unit param. that's what I had to do to get cloudwatch to read them correctly

But for metrics that are collected / emitted automatically, I can't control this and thus important metrics like process_runtime_jvm_memory_allocation_bytes_count fail to be read by cloudwatch because its metadata is actually something like

# TYPE process_runtime_jvm_memory_allocation histogram

I can see that this is actually a shortcoming of the Cloudwatch prometheus scraper but it would be great if clj-otel (specifically the prometheus exporter) could help me do something about it.

steffan-westcott commented 9 months ago

I agree that the issue lies with the AWS CloudWatch integration. If you are using ADOT Collector, this issue is relevant as it mentions upcoming alignment with the upstream OpenTelemetry Collector. You may be interested in the opt-in feature gate pkg.translator.prometheus.NormalizeName