prometheus: Remove namespace on attributes

gouthamve commented 1 year ago

From @bertysentry at https://github.com/prometheus/prometheus/pull/12571#discussion_r1280440366

For example, for JVM metrics: the process.runtime.jvm.gc.duration metric will come with the process.runtime.jvm.gc.name and process.runtime.jvm.gc.action attributes. This will end up as process_runtime_jvm_gc_duration_seconds_counter{process_runtime_jvm_gc_name="...", process_runtime_jvm_gc_action="..."} in PromQL.

We could decide to simplify process_runtime_jvm_gc_name and process_runtime_jvm_gc_action labels to name and action respectively by removing the matching namespaces in metric name and its labels.

The benefit is that we get metrics and labels that look much like "classic" Prometheus metrics and labels. The problem is that the behavior could be counter-intuitive and difficult to predict in some cases (which labels get simplified or not?).

It is a good point and I think also exporting the namespaces in every label makes it extremely cumbersome to work with and query the data. Could we make it part of the Proemtheus compatibility spec to not send the namespace in the labels to Prometheus?

Oberon00 commented 1 year ago

Have you seen https://github.com/open-telemetry/semantic-conventions/issues/51? I think a decision on that topic was already made

dashpole commented 1 year ago

I think the proposal is that prometheus exporters strip namespaces from metric labels if they match the namespace of the metric name

gouthamve commented 1 year ago

Yup, this is for the Prometheus compatibility spec. I tried to make it slightly clear in the description as well. Sorry for the confusion.

bertysentry commented 1 year ago

Or... we could ~~set up a coup against~~ convince 😅 the TC that the decision to prefix all attributes in metrics was non-optimal, and will imply a lot of CPU cycles just to remove these in Prometheus and all its clones, which means more energy consumed, and more carbon in the atmosphere, which is bad.

The decision was a close call (only a relative majority of 4/9 voted for prefixing all attributes with the metric's namespace). And nothing has actually been implemented yet. It's not too late to change our mind, nothing's set in stone, after all.

Note: At this point of my comment, I think I need to clearly say that I will abide by the decision of the TC on this. I'm just highlighting a new reason to discuss the decision made.

gouthamve commented 1 year ago

Also, if we remove the prefix on export, do we need to reintroduce back during scraping? I don't know.

jsuereth commented 1 year ago

@bertysentry We were aware of the prometheus concerns when we voted.

Regarding removing the prefix on export to prometheus:

Since we do have programatic access to all attributes and metric names in semconv, would it make sense to have some kind of codegen in semconv that could create an optimal metric-transform for Prometheus exporters that only does namespace transformation on OTEL metrics?

I imagine we could do some kind of lookup-table of transformations instead of string-prefix searches, in addition to limiting the "surprise" to users for their own metrics where they may not be using namespaces but accidentally fall afoul of namespace-stripping logic.

jsuereth commented 1 year ago

Another point I want to make is namespacing changes in otel. We're moving towards "reasonably unique" namespaces, so all your process.runtime.jvm examples become jvm. The underlying concern still exists, but it's slightly better:

jvm_gc_duration_seconds_counter{jvm_gc_name="...", jvm_gc_action="...", ...}

bertysentry commented 1 year ago

@jsuereth The lookup table from the semconv is a good idea. However, we will still need to implement the logic for all metrics that are not covered by semconv (and there will always be plenty).

WRT "reasonably unique" namespaces, I think it's a good thing, I mean, it make things less horrible ;-)

open-telemetry / opentelemetry-specification

prometheus: Remove namespace on attributes #3634