open-telemetry / opentelemetry-proto

OpenTelemetry protocol (OTLP) specification and Protobuf definitions
https://opentelemetry.io/docs/specs/otlp/
Apache License 2.0
602 stars 259 forks source link

Specify whether resource envelopes containing no telemetry points are valid OTLP #598

Open isaaczinda opened 2 days ago

isaaczinda commented 2 days ago

I would like the OTLP spec to specify whether “empty telemetry envelopes” are valid OTLP. Some examples would be a ResourceMetrics with no ScopeMetrics inside it, or a ResourceMetrics with no Metric inside it. This question is not limited to metrics; it also applies to logs and spans.

I'm interested in this question because I'm building a telemetry filtering and transformation pipeline. If empty envelopes aren't allowed, I'll drop them and log an error. If they are allowed, I'll allow them to pass through.

Why an Empty Envelope May be Useful

The attributes service.name, k8s.pod.name, and k8s.cluster.name are all stored on the Resource by convention. This information on its own, independent of any telemetry signal, can be quite useful. For example, you could use it to understand how many pods are in each service.

I can imagine a customer filtering out all or most of their telemetry signal for cost reasons but wanting to keep the resource information. With some de-duplication in the collector, this could provide a high-level picture of one’s cloud estate at a minimal egress cost. It's possible that something similar could be achieved with Entities, but Resources are still widely used for this purpose, and it's not always straightforward (or possible) to adjust existing instrumentation.

Prior Art

The filterprocessor appears to take the perspective that if there are no telemetry points inside an envelope, the envelope should be deleted (read the code here).

tigrannajaryan commented 1 day ago

The proto spec unfortunately doesn't say what's expected. I think it should. Our best bet at the moment is likely to examine implementations and provided that the implementations behave mostly similarly we should specify that behavior here in this repo.

If all/most other components behave like filterprocessor then we should take that as the defacto spec. Our goal here should be to break as little existing code and existing observers of OTLP as possible.

The attributes service.name, k8s.pod.name, and k8s.cluster.name are all stored on the Resource by convention. This information on its own, independent of any telemetry signal, can be quite useful. For example, you could use it to understand how many pods are in each service.

Use cases like this are still likely better served by the Entities. Entity events are precisely that: independently from metrics/traces/logs they indicate presence of things like k8s pods, nodes, clusters, etc.

We should not modify the spec to contradict existing implementations just to serve this use case.

jmacd commented 1 day ago

I thought I would add some examples from the code base.

The core OTLP exporter will export such data.

The core batch processor will NOT export such data.

The exporter batcher will export such data.

I would say that "envelopes" with no telemetry points are valid OTLP; the real question is whether they can be dropped by processors that filter and aggregate. My personal attitude would say that these envelopes can be dropped. @tigrannajaryan would you point us to the specification work about how entities will be encoded in OTLP?

isaaczinda commented 1 day ago

I would say that "envelopes" with no telemetry points are valid OTLP; the real question is whether they can be dropped by processors that filter and aggregate.

I'm a bit confused by this. If message X is "valid OTLP" but all processors that filter and aggregate are allowed to drop it by default, how valid is it really? In other words, shouldn't valid OTLP be respected by all components? Of course, we could configure certain components to drop empty envelopes (e.g. by way of an OTTL function is_empty). But I think that's different than saying that components can freely drop this sort of data by default.

isaaczinda commented 1 day ago

@tigrannajaryan @jmacd how can I help out here? Would it be helpful for me to audit all core components and see how they treat empty envelopes?

tigrannajaryan commented 23 hours ago

@tigrannajaryan would you point us to the specification work about how entities will be encoded in OTLP?

Here is the data model, and the corresponding prototype in OTLP. This is preliminary, subject to change, we are actively iterating on it.

tigrannajaryan commented 23 hours ago

I'm a bit confused by this. If message X is "valid OTLP" but all processors that filter and aggregate are allowed to drop it by default, how valid is it really?

A possible approach is this: an empty envelope may be considered valid AND be required to be interpreted as NOOP. In that case it is perfectly fine to drop it, since delivering or dropping a NOOP payload is functionally equivalent.

We would not design it this way in the first place, but if this is what happens in reality then we can describe this behavior in the spec and I don't think it would be totally weird.

I suggest that we don't rush this and take stock of implementations.

how can I help out here? Would it be helpful for me to audit all core components and see how they treat empty envelopes?

Absolutely. It would be great to do some spelunking in the Collector and in language SDKs to understand how we interpret empty envelopes when we receive them, and also whether we have any senders that send empty envelopes.

jmacd commented 22 hours ago

I'm a bit confused by this. If message X is "valid OTLP" but all processors that filter and aggregate are allowed to drop it by default, how valid is it really?

In my thinking, the empty request is valid because it is well formed, but it contains no spans/logs/metric points, so it is immediately a success and there is no reason to send an empty request except to test a connection. I shouldn't have said "dropped". I could have said "successfully received, declared success". My interpretation comes from agreeing with the batch processor's approach, which can incorporate the empty request into a batch with no change of data; if the batch processor is correct, then returning immediate success for an empty envelope is also correct.

jmacd commented 21 hours ago

Additional notes:

The OTel-Arrow exporter will eliminate the empty envelopes as part of its optimization process.

The groupbyattrs processor appears to eliminate empty envelopes.