open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
340 stars 164 forks source link

Initial Entity and Resource proposal. #264

Closed jsuereth closed 1 month ago

jsuereth commented 3 months ago

This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification.

The proposal includes all links and context needed to justify it, but duplicating a snapshot here:

Motivation

This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups:

austinlparker commented 3 months ago

A question for @jsuereth (and perhaps this has been addressed in WG meetings, if so, feel free to point me to those discussions) but my read of this proposal is that there is some future state where resource attributes are all contained inside entities, rather than flat, thus changing the 'logical' top-level resource attributes into complex attributes rather than flat ones. This is not a pattern that existing tools really use; Is the intention that these attributes should be flattened on ingest in order to allow traditional group-by/analysis functions and the entity definition ingested as some sort of hash (or other platform-appropriate representation)?

bboreham commented 3 months ago

Drive-by comment: I expected the top-level description to reference https://github.com/open-telemetry/oteps/blob/main/text/entities/0256-entities-data-model.md

jsuereth commented 3 months ago

A question for @jsuereth (and perhaps this has been addressed in WG meetings, if so, feel free to point me to those discussions) but my read of this proposal is that there is some future state where resource attributes are all contained inside entities, rather than flat, thus changing the 'logical' top-level resource attributes into complex attributes rather than flat ones. This is not a pattern that existing tools really use; Is the intention that these attributes should be flattened on ingest in order to allow traditional group-by/analysis functions and the entity definition ingested as some sort of hash (or other platform-appropriate representation)?

The goal of this is that tooling can continue to engage with the flattened resource attribtue model we provide today. optionally tooling can engage with the new bundled entity definitions (and OTEL itself, e.g. the collector, would do so).

This gives flexibility for users to engage or not engage with entities, depending on their need, and opens up further exploration of entities.

tigrannajaryan commented 3 months ago

Is there any prototype implementation of this? It sounds reasonable to me but is pretty abstract.

@jack-berg There are early prototypes in Go and in Collector (core and contrib). The prototypes somewhat deviate from this proposal (e.g. the prototype only allows one associated EntityRef in the Resource, it uses key/value pairs, not key arrays, etc). Despite some deviation I think the prototypes demonstrate the idea (and we can update the prototypes to align with this proposal).

jsuereth commented 2 months ago

Is there any prototype implementation of this? It sounds reasonable to me but is pretty abstract.

Not one that includes all of the changes to initial thinking. I'm happy to get one put together though.

ArthurSens commented 2 months ago

Hi folks, I'm seeing a lot of approvals here already but this proposal just got into the Prometheus-team circle. Could you give us a bit of time to review how this affects the Prometheus compatibility? I'd love to get @aknuds1 point of view since he is the most involved in the mentioned info() function proposal.

How fast do you need a review from us? 🙂

jsuereth commented 2 months ago

@ArthurSens Apologies! I pulled in @dashpole early for some prometheus compatibility consultations (see the prometheus compat section, which he contributed).

From my understanding we have a few non-breaking options going forward that should work with info() function. This section is light, but outlines a few paths forward:

I'd like to get this merged this week or next so we can make progress on prototypes and answering open questions (one of which is prometheus compatibiltiy). Do not mistake an OTEP for a committed solution. Until we have a PR against our protocol, anything here is a design doc / path forward. We need to answer open questions - and Prometheus is one we should be discussing. So if you don't have time to get to it this week, don't worry. We plan to answer/adapt things as needed for Prometheus compatibility as we progress.

aknuds1 commented 2 months ago

As proposed - we can create new entity-specific info metrics with descriptive attributes. I'd love to see @aknuds1 confirm, the last time I could check, these info metrics would work with the proposed info PromQL support exactly the way we want them to.

@jsuereth Sorry I'm late in reviewing this, but reading the PR now.

I should perhaps clarify the proposed MVP version of the info PromQL function only works with target_info (by default) or other info metrics (can be specified through a __name__ label matcher) that are identified through the instance and job labels. For info to be more intelligent about which info metrics exist, and which are their identifying labels, I think native metadata support first has to be added to Prometheus (that's something @codesome and I are trying to design).

jsuereth commented 2 months ago

Edit: adding CNCF thread on this: https://cloud-native.slack.com/archives/C01LSCJBXDZ/p1726584544475279

@aknuds1 - We should discuss more. My assumption on OTLP<->Prometheus here is that we can use identifying attributes to synthesize a job/instance label rather than forcing prometheus to understand all of these other labels.

Then, you can use info to interact with target_info (by default, has all the resource attributes) and/or a possible per-entity info metric.

We should continue this discussion somewhere higher bandwidth, cncf slack otel-prometheus channel?

carlosalberto commented 2 months ago

Overall LGTM, although I'd like to know about details going forward, to set expectations on how the SDKs will be updated:

jsuereth commented 1 month ago

Thanks @carlosalberto !

Will Resource stay immutable (with the addition of the new entities field), and will be internally updated by the SDK with the latest instance tracked? If not, will attributes stay as read-only at least, in a best-effort attempt?

I think a follow on OTEP will define the needs and design for "mutable" resource. This OTEP calls out that users want descriptive/mutable attributes on Resource. I expect the ResourceProvider concept to be expanded on to provide that (cc @tedsuo who I think is working on that proposal, or updating the existing one).

For now - this proposal has resource remain read-only and defines a merge algorithm for a ResourceProvider, but with identifying "descriptive" (or changing) attributes as a first step. This does not solve the needs of browser until the follow on OTEP is complete and approved.

The merge algorithm mentions schema_url as a comparison point between entities of the same type, and then one of the examples mentions having the same schema_url with different schema version. Is this something to be tuned/clarified later on, regarding on how to merge values in such case?

We clarify schema url versions in the specificaiton already - https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/schemas#schema-url

So we already have a way to understand version and whether or not they match. We do not have any notion of "compatibility" encoded in version but we can check for equality and order them (to know which is latest).

jsuereth commented 1 month ago

Thanks @annabokhan !

tigrannajaryan commented 1 month ago

We have a broad consensus, all comments are resolved. Merging. Thanks @jsuereth