open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.75k stars 889 forks source link

Prometheus Exporter: Copy Resource attributes into each time-series attributes #3705

Closed asafm closed 11 months ago

asafm commented 1 year ago

Background When using Resources with an OTel SDK, the attributes of the resource are exported as attributes of the time series target_info by the Prometheus Exporter. This generates a very hard to use user experience for the user querying this data, as they are forced to use joins which makes query authoring hard, non-discoverable and also hard to read the query. This was nicely outlined in this document.

Seems that the best practice is to copy those attributes into each time-series attributes. Today done by the Prometheus scraper.

What are you trying to achieve? I would like to improve the UX of people using OTel SDK and exporting with Prometheus exporters, such that they will have the option to copy all or some of the attributes of the resource, optionally adding a prefix of their choice.

What's the alternative today? The user of OTel SDK avoid exposing certain attributes as Resource, and creates a wrapper on top of OTel which it uses to create Attributes with - to make sure all resource attributes will also exists in each Attributes. That's the only way.

dashpole commented 1 year ago

@open-telemetry/wg-prometheus

asafm commented 1 year ago

@dashpole Do you have specific people in mind we can mention to kickstart discussion?

dashpole commented 1 year ago

The prometheus dev summit just happened, and this topic was discussed there (look for "Making Prometheus OTel native"): https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit.

From what I can see, the consensus was:

We want users to be able to operate on joined/copied/target labels as if they were real labels.

@gouthamve do you think this direction is still worth pursuing? It seems like it might be worth waiting to see what proposals come from this, although its hard to judge just from the notes. I'm OK adding this as a short-term solution, although i'd prefer not changing the default behavior. But I would like to stay aligned with the direction the prometheus community is headed.

dashpole commented 1 year ago

@asafm I don't expect any pushback from OTel maintainers, so i'm mostly trying to make sure this is the direction the Prom community wants to go before we move forward with it.

asafm commented 1 year ago

@dashpole Any specific person we can ping from Prometheus to chime in?

gouthamve commented 1 year ago

Hi y'all sorry. Been busy and then sick.

So from a Prometheus perspective, we will solve this problem but not soon. If I have to guess a timeline, I'd say ~6 months imo (being very optimisitic). The idea it to somehow store and make it easy to query on resource attributes. This could be through automatic joins on target_info.

We need an interim solution though, and it might end up that when we ingest in Prometheus, we will do the copy.

However, what to copy, should we add resource_ prefix or not, etc. are undefined and it would be great if the OTel specification gave guidance here that Prometheus impelements. So I do think we should make progress here.

asafm commented 1 year ago

@gouthamve So you think its wise OTel SDKs would support copying resource attributes into each metric attributes as interim opt-in solution (given Prometheus would take worst case a year, and 2 years for entire eco-system of Prometheus-like vendors to get aligned) ?

gouthamve commented 1 year ago

I kinda want to say its wise, but I am torn a little. Let me explain my thoughts.

Scenario 1:

Application --OTLP--> Collector --Prom_exporters--> Prometheus

In this scenario, I'd much rather do the configuration on the Collector/Prom_exporter side, rather than on the application end.

Scenario 2:

Application --/metrics scrape--> Prometheus

In this case, I am not sure the answer should be to copy the resource attributes into metrics. The non-otel usecase is to rely on service discovery + relabelling to add the relevant resource attributes. In the non-otel world, the application actually doesn't care about where it is running and has no information around resource attributes. All that is added at scrape time.

Shouldn't that be the same with otel? What resource attributes can an application expose that are not available in Prometheus service discovery? A couple of concrete examples would be super helpful.

tbf, I feel like I am missing some context here on the use-case and how the otel community expects applications to work in general. Do we want the app to expose most of the metadata itself, or do we expect the Collector to add it?

asafm commented 1 year ago

I guess a very simple use would be cluster attribute. Apache Pulsar is a distributed messaging system. Each node (Broker) is associated with a certain cluster. Say out of 10 Brokers, 5 can belong to cluster1 and remaining 5 can belong to cluster2. You can define geo-replication between them (supported out of the box in the broker as long as you have one global ZK acting as configuration store).

One option I have in the broker's code is to add cluster attribute as a Resource attribute. It makes sense to do so since it is in the granularity of the broker (process). In the case of Prometheus I expect it to copy this attribute to any outgoing exported time-series attributes.

Given that I currently don't have this ability, I'm forced to create some sort of AttributesFactory, in which any time you wish to create Attributes for anything in the Broker, you will ask the AttributesFactory to create a Builder for Attributes, already initialized with the Cluster attributes. Of course in this case, I lose the ability to have that defined properly as a resource attributes (When exported via OTLP) since I don't want it define twice, and I don't really know what people would configure downstream. Currently that's my only alternative.

Hence I wanted to have that ability in the Prometheus exporter, so it will be the one copying the attributes in Resource to any exported time-series attributes. My assumption is that if you are using Prometheus exporter, you have a Prometheus compatible scraper, scraping and inserting it as is to any TSDB.

This cluster property is not something I expect to exist in Prometheus service discovery, as it needs access to the each broker's configuration file.

gouthamve commented 1 year ago

Thanks for the clarification. In the Prometheus world, for use-cases like these, every metrics should be tagged with cluster, which is what you're proposing. I think we can go ahead with this proposal.

asafm commented 1 year ago

@dashpole What do you say?

jesusvazquez commented 1 year ago

πŸ‘‹ hey there! Goutham shared this issue with me. I'm Jesus Vazquez Prometheus maintainer for the TSDB and also planning to work more on making Prometheus more OTEL native in the future.

Just wanted to say that Prometheus has plans for solving this but we are in an early stage where we still have to produce a doc with different ideas on how to address it. There are multiple options to solve this i.e copy labels on ingestion, resolve on query time without the need for a explicit join, store resource attributes in metadata once metadata can be persisted....

Many different options with many different timelines for completion. Maybe there is also the option of copying them on ingestion and iterating to a more refined version when we can...

I guess my comments on this issue are: Yes Prometheus wants to solve this but it will take a little while.

So on a personal note, I think giving the option to copy those labels on the SDK sounds interesting for the user and if there is room for that for now, making it explicit that it will be removed in the future once Prometheus has support for it then lets go for it. But again this is just my opinion and the people maintaining this will know better πŸ™

dashpole commented 1 year ago

Given the discussion, I think we should change the specification to say something like:

Exporters MAY offer configuration to add resource attributes as metric labels. By default, it MUST be not add any resource attributes as metric attributes.

I expect SDKs to only add this if it is requested by users, not proactively implement it. Once an SDK supports this configuration, I think it is unlikely it is removed later on (although it may be deprecated/not recommended).

Does that meet your needs @asafm?

Question (probably for @jesusvazquez): Should labels copied from resource -> metric be removed from target_info? If target_info contained the same label as the metric, would that interfere with future plans to make target_info more useable?

asafm commented 1 year ago

@dashpole Yes, it meet my needs, as after the specifications has this, I can push this as a PR for Java SDK (@jack-berg).

Is it ok I'll create a PR for this @dashpole in the specifications?

jesusvazquez commented 1 year ago

Question (probably for @jesusvazquez): Should labels copied from resource -> metric be removed from target_info? If target_info contained the same label as the metric, would that interfere with future plans to make target_info more useable?

I dont think they should be removed. If the implementation to have them available on the original series names changes we might regret deleting them originally. So I would just keep them πŸ‘

dashpole commented 1 year ago

@asafm go for it